The --input value is a tab-separated file (sample-sheet) with each row describing the data and metadata of a sample.
A minimal sample-sheet for the vcf workflow could look like this:
| individual_id |
vcf |
| sample0 |
sample0.vcf.gz |
| sample1 |
sample1.vcf.gz |
| sample2 |
sample2.vcf.gz |
Sample-sheet values are case sensitive. Columns can contain values of different types:
| type |
description |
| boolean |
allowed values: [true, false] |
| enum |
categorical value |
| file |
absolute file path or file path relative to the sample sheet |
| file list |
comma-separated list of file paths |
| string |
text |
| string list |
comma-separated list of strings |
The following sections describe the columns that can be used in every sample-sheet followed by workflow
specific columns.
Columns
| column |
type |
required |
default |
description |
project_id |
string |
|
vip |
project identifier, see here |
family_id |
string |
|
fam<index> |
family identifier |
individual_id |
string |
yes |
|
sample identifier of the individual |
paternal_id |
string |
|
|
sample identifier of the father |
maternal_id |
string |
|
|
sample identifier of the mother |
sex |
enum |
|
unknown sex |
values: [male,female] Please note that an unknown sex leads to a Spectre CNV analysis that assumes female for the ploidy determination of chromosome X. |
affected |
boolean |
|
unknown affected status |
whether the individual is affected |
proband |
boolean |
|
depends1 |
individual being reported on |
hpo_ids |
string list |
|
|
regex: /HP:\d{7}/ from HPO v2024-08-13. term must be a child of 'Phenotypic abnormality' (HP:0000118) |
sequencing_method |
enum |
|
WGS |
allowed values: [WES,WGS], value must be the same for all project samples |
regions |
file |
|
|
allowed file extensions: [bed]. filter variants overlapping with regions in bed file |
pcr_performed |
boolean |
false |
false |
Indication if PCR was performed to get the data, if so certain tools will be disabled due to not being compatible with this data. |
1 Exception: if no probands are defined in the sample-sheet then all samples are considered to be probands.
Columns: FASTQ
| column |
type |
required |
default |
description |
adaptive_sampling |
file |
|
|
allowed file extensions: [csv]. for nanopore adaptive sampling experiments, used to filter stop_receiving reads |
fastq |
file list |
yes2 |
|
allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. single-reads file(s) |
fastq_r1 |
file list |
yes2 |
|
allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. paired-end reads file(s) #1 |
fastq_r2 |
file list |
yes2 |
|
allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. paired-end reads file(s) #2 |
sequencing_platform |
enum |
|
nanopore |
allowed values: [illumina,nanopore,pacbio_hifi], value must be the same for all project samples |
2 Either the fastq or the fastq_r1 and fastq_r2 are required.
Columns: CRAM
| column |
type |
required |
default |
description |
cram |
file |
yes |
|
allowed file extensions: [bam, cram, sam] |
sequencing_platform |
enum |
|
illumina |
allowed values: [illumina,nanopore,pacbio_hifi], value must be the same for all project samples |
Columns: gVCF
| column |
type |
required |
default |
description |
assembly |
enum |
|
GRCh38 |
allowed values: [GRCh37, GRCh38, T2T] |
gvcf |
file |
yes |
|
allowed file extensions: [gvcf, gvcf.gz, gvcf.bgz, vcf, vcf.gz, vcf.bgz, bcf, bcf.gz, bcf.bgz] |
cram |
file |
|
|
allowed file extensions: [bam, cram, sam] |
Columns: VCF
| column |
type |
required |
default |
description |
assembly |
enum |
|
GRCh38 |
allowed values: [GRCh37, GRCh38, T2T], value must be the same for all project samples |
vcf |
file |
yes |
|
allowed file extensions: [vcf, vcf.gz, vcf.bgz, bcf, bcf.gz, bcf.bgz], value must be the same for all project samples |
cram |
file |
|
|
allowed file extensions: [bam, cram, sam] |