The --input
value is a tab-separated file (sample-sheet) with each row describing the data and metadata of a sample.
A minimal sample-sheet for the vcf
workflow could look like this:
individual_id |
vcf |
sample0 |
sample0.vcf.gz |
sample1 |
sample1.vcf.gz |
sample2 |
sample2.vcf.gz |
Sample-sheet values are case sensitive. Columns can contain values of different types:
type |
description |
boolean |
allowed values: [true , false ] |
enum |
categorical value |
file |
absolute file path or file path relative to the sample sheet |
file list |
comma-separated list of file paths |
string |
text |
string list |
comma-separated list of strings |
The following sections describe the columns that can be used in every sample-sheet followed by workflow
specific columns.
Columns
column |
type |
required |
default |
description |
project_id |
string |
|
vip |
project identifier, see here |
family_id |
string |
|
fam<index> |
family identifier |
individual_id |
string |
yes |
|
sample identifier of the individual |
paternal_id |
string |
|
|
sample identifier of the father |
maternal_id |
string |
|
|
sample identifier of the mother |
sex |
enum |
|
unknown sex |
values: [male,female] Please note that an unknown sex leads to a Spectre CNV analysis that assumes female for the ploidy determination of chromosome X. |
affected |
boolean |
|
unknown affected status |
whether the individual is affected |
proband |
boolean |
|
depends1 |
individual being reported on |
hpo_ids |
string list |
|
|
regex: /HP:\d{7}/ from HPO v2024-08-13. term must be a child of 'Phenotypic abnormality' (HP:0000118) |
sequencing_method |
enum |
|
WGS |
allowed values: [WES ,WGS ], value must be the same for all project samples |
regions |
file |
|
|
allowed file extensions: [bed ]. filter variants overlapping with regions in bed file2 |
1 Exception: if no probands are defined in the sample-sheet then all samples are considered to be probands.
Columns: FASTQ
column |
type |
required |
default |
description |
adaptive_sampling |
file |
|
|
allowed file extensions: [csv ]. for nanopore adaptive sampling experiments, used to filter stop_receiving reads |
fastq |
file list |
yes3 |
|
allowed file extensions: [fastq , fastq.gz , fq , fq.gz ]. single-reads file(s) |
fastq_r1 |
file list |
yes3 |
|
allowed file extensions: [fastq , fastq.gz , fq , fq.gz ]. paired-end reads file(s) #1 |
fastq_r2 |
file list |
yes3 |
|
allowed file extensions: [fastq , fastq.gz , fq , fq.gz ]. paired-end reads file(s) #2 |
sequencing_platform |
enum |
|
nanopore |
allowed values: [illumina ,nanopore ,pacbio_hifi ], value must be the same for all project samples |
3 Either the fastq
or the fastq_r1
and fastq_r2
are required.
Columns: CRAM
column |
type |
required |
default |
description |
cram |
file |
yes |
|
allowed file extensions: [bam , cram , sam ] |
sequencing_platform |
enum |
|
illumina |
allowed values: [illumina ,nanopore ,pacbio_hifi ], value must be the same for all project samples |
Columns: gVCF
column |
type |
required |
default |
description |
assembly |
enum |
|
GRCh38 |
allowed values: [GRCh37 , GRCh38 , T2T ] |
gvcf |
file |
yes |
|
allowed file extensions: [gvcf , gvcf.gz , gvcf.bgz , vcf , vcf.gz , vcf.bgz , bcf , bcf.gz , bcf.bgz ] |
cram |
file |
|
|
allowed file extensions: [bam , cram , sam ] |
Columns: VCF
column |
type |
required |
default |
description |
assembly |
enum |
|
GRCh38 |
allowed values: [GRCh37 , GRCh38 , T2T ], value must be the same for all project samples |
vcf |
file |
yes |
|
allowed file extensions: [vcf , vcf.gz , vcf.bgz , bcf , bcf.gz , bcf.bgz ], value must be the same for all project samples |
cram |
file |
|
|
allowed file extensions: [bam , cram , sam ] |