Input¶

The --input value is a tab-separated file (sample-sheet) with each row describing the data and metadata of a sample.

A minimal sample-sheet for the vcf workflow could look like this:

Sample-sheet values are case sensitive. Columns can contain values of different types:

type	description
boolean	allowed values: [`true`, `false`]
enum	categorical value
file	absolute file path or file path relative to the sample sheet
file list	comma-separated list of file paths
string	text
string list	comma-separated list of strings

The following sections describe the columns that can be used in every sample-sheet followed by workflow specific columns.

Columns¶

column	type	required	default	description
`project_id`	`string`		`vip`	project identifier, see here
`family_id`	`string`		`fam<index>`	family identifier
`individual_id`	`string`	yes		sample identifier of the individual
`paternal_id`	`string`			sample identifier of the father
`maternal_id`	`string`			sample identifier of the mother
`sex`	`enum`		unknown sex	`values: [male,female]` Please note that an unknown sex leads to a Spectre CNV analysis that assumes `female` for the ploidy determination of chromosome X.
`affected`	`boolean`		unknown affected status	whether the individual is affected
`proband`	`boolean`		depends¹	individual being reported on
`hpo_ids`	`string list`			regex: `/HP:\d{7}/` from HPO v2024-08-13. term must be a child of 'Phenotypic abnormality' (HP:0000118)
`sequencing_method`	`enum`		`WGS`	allowed values: [`WES`,`WGS`], value must be the same for all project samples
`regions`	`file`			allowed file extensions: [`bed`]. filter variants overlapping with regions in bed file
`pcr_performed`	`boolean`	false	false	Indication if PCR was performed to get the data, if so certain tools will be disabled due to not being compatible with this data.

¹ Exception: if no probands are defined in the sample-sheet then all samples are considered to be probands.

column	type	required	default	description
`adaptive_sampling`	`file`			allowed file extensions: [`csv`]. for `nanopore` adaptive sampling experiments, used to filter `stop_receiving` reads
`fastq`	`file list`	yes²		allowed file extensions: [`fastq`, `fastq.gz`, `fq`, `fq.gz`]. single-reads file(s)
`fastq_r1`	`file list`	yes²		allowed file extensions: [`fastq`, `fastq.gz`, `fq`, `fq.gz`]. paired-end reads file(s) #1
`fastq_r2`	`file list`	yes²		allowed file extensions: [`fastq`, `fastq.gz`, `fq`, `fq.gz`]. paired-end reads file(s) #2
`sequencing_platform`	`enum`		`nanopore`	allowed values: [`illumina`,`nanopore`,`pacbio_hifi`], value must be the same for all project samples

² Either the fastq or the fastq_r1 and fastq_r2 are required.

column	type	required	default	description
`cram`	`file`	yes		allowed file extensions: [`bam`, `cram`, `sam`]
`sequencing_platform`	`enum`		`illumina`	allowed values: [`illumina`,`nanopore`,`pacbio_hifi`], value must be the same for all project samples

column	type	required	default	description
`assembly`	`enum`		`GRCh38`	allowed values: [`GRCh37`, `GRCh38`, `T2T`]
`gvcf`	`file`	yes		allowed file extensions: [`gvcf`, `gvcf.gz`, `gvcf.bgz`, `vcf`, `vcf.gz`, `vcf.bgz`, `bcf`, `bcf.gz`, `bcf.bgz`]
`cram`	`file`			allowed file extensions: [`bam`, `cram`, `sam`]

column	type	required	default	description
`assembly`	`enum`		`GRCh38`	allowed values: [`GRCh37`, `GRCh38`, `T2T`], value must be the same for all project samples
`vcf`	`file`	yes		allowed file extensions: [`vcf`, `vcf.gz`, `vcf.bgz`, `bcf`, `bcf.gz`, `bcf.bgz`], value must be the same for all project samples
`cram`	`file`			allowed file extensions: [`bam`, `cram`, `sam`]