Input

The --input value is a tab-separated file (sample-sheet) with each row describing the data and metadata of a sample.

A minimal sample-sheet for the vcf workflow could look like this:

individual_id vcf
sample0 sample0.vcf.gz
sample1 sample1.vcf.gz
sample2 sample2.vcf.gz

Sample-sheet values are case sensitive. Columns can contain values of different types:

type description
boolean allowed values: [true, false]
enum categorical value
file absolute file path or file path relative to the sample sheet
file list comma-separated list of file paths
string text
string list comma-separated list of strings

The following sections describe the columns that can be used in every sample-sheet followed by workflow specific columns.

Columns

column type scope required default description
project_id string project vip project identifier
family_id string family fam<index> family identifier
individual_id string sample yes sample identifier of the individual
paternal_id string family sample identifier of the father
maternal_id string family sample identifier of the mother
sex enum sample unknown sex values: [male,female] Please note that an unknown sex leads to a Spectre CNV analysis that assumes female for the ploidy determination of chromosome X.
affected boolean sample unknown affected status whether the individual is affected
proband boolean sample depends1 individual being reported on
hpo_ids string list sample regex: /HP:\d{7}/ from HPO v2024-08-13. term must be a child of 'Phenotypic abnormality' (HP:0000118)
sequencing_method enum project WGS allowed values: [WES,WGS]
regions file project allowed file extensions: [bed]. filter variants overlapping with regions in bed file
pcr_performed boolean project false false Indication if PCR was performed to get the data, if so certain tools will be disabled due to not being compatible with this data.

1 Exception: if no probands are defined in the sample-sheet then all samples are considered to be probands.

Columns: FASTQ

column type scope required default description
adaptive_sampling file sample allowed file extensions: [csv]. for nanopore adaptive sampling experiments, used to filter stop_receiving3 or sequence3 reads
fastq file list sample yes2 allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. single-reads file(s)
fastq_r1 file list sample yes2 allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. paired-end reads file(s) #1
fastq_r2 file list sample yes2 allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. paired-end reads file(s) #2
sequencing_platform enum project nanopore allowed values: [illumina,nanopore,pacbio_hifi]

2 Either the fastq or the fastq_r1 and fastq_r2 are required. 3 stop_receiving when Nanopore output adaptive sampling file specification < 0.1 was used, sequence for version 0.1 and above.

Columns: CRAM

column type scope required default description
cram file sample yes allowed file extensions: [bam, cram, sam]
sequencing_platform enum project nanopore allowed values: [illumina,nanopore,pacbio_hifi]

Columns: gVCF

column type scope required default description
assembly enum project GRCh38 allowed values: [GRCh37, GRCh38, T2T]
gvcf file sample yes allowed file extensions: [gvcf, gvcf.gz, gvcf.bgz, vcf, vcf.gz, vcf.bgz, bcf, bcf.gz, bcf.bgz]
cram file sample allowed file extensions: [bam, cram, sam]

Columns: VCF

column type scope required default description
assembly enum project GRCh38 allowed values: [GRCh37, GRCh38, T2T]
vcf file project yes allowed file extensions: [vcf, vcf.gz, vcf.bgz, bcf, bcf.gz, bcf.bgz]
cram file sample allowed file extensions: [bam, cram, sam]