Config¶

The VIP configuration is stored in Nextflow configuration files. An additional configuration file can be supplied on the command-line to overwrite default parameter values, add/update profiles, configure processes and update environment variables.

Tip

VIP enables the Slurm executor if available, configuration options are listed here.

Parameters¶

key	default	description
assembly	GRCh38	output assembly, allowed values: [GRCh38]
GRCh37.reference.chain.GRCh38		chain file to convert GRCh37 to GRCh38 data
GRCh37.reference.fasta
GRCh37.reference.fastaFai
GRCh37.reference.fastaGzi
GRCh38.reference.fasta	installed	GCA_000001405.15_GRCh38_no_alt_analysis_set
GRCh38.reference.fastaFai	installed
GRCh38.reference.fastaGzi	installed
T2T.reference.chain.GRCh38	installed	chain file to convert T2T to GRCh38 data
T2T.reference.fasta
T2T.reference.fastaFai
T2T.reference.fastaGzi

Warning: Please take note of the fact that for a different reference fasta.gz the unzipped referenfasta file is also required. Both the zipped and unzipped fasta should have an index.

FASTQ¶

key	default	description
GRCh38.reference.fastaMmi	installed	for details, see here
fastp.options		for details, see here
minimap2.soft_clipping	true	In SAM output, use soft clipping for supplementary alignments (required when STR calling with Straglr)
minimap2.nanopore_preset	lr:hq	Preset to use for aligning Nanopore data, options: 'lr:hq' 'map-ont'.

CRAM¶

key	default	description
cnv.spectre.GRCh38.blacklist	installed	blacklist in bed format for sites that will be ignored
cnv.spectre.GRCh38.metadata	installed	metadata file for Ns removal, update this file only when using a different GRCh38 version than the one provided by VIP.
cram.call_snv	true	enable/disable the detection of short variants
cram.call_str	true	enable/disable the detection of short tandem repeats
cram.call_sv	true	enable/disable the detection of structural variants. disable this manually in case of non-paired-end Illumina data.
snv.deeptrio.illumina.WES.model_name	WES	for details, see here
snv.deeptrio.illumina.WGS.model_name	WGS	for details, see here
snv.deeptrio.nanopore.model_name	ONT	for details, see here
snv.deeptrio.pacbio_hifi.model_name	PACBIO	for details, see here
snv.deepvariant.illumina.WES.model_name	WES	for details, see here
snv.deepvariant.illumina.WGS.model_name	WGS	for details, see here
snv.deepvariant.nanopore.model_name	ONT_R104	for details, see here
snv.deepvariant.pacbio_hifi.model_name	PACBIO	for details, see here
snv.glnexus.WES.preset	DeepVariantWES	for details, see here. allowed values: [DeepVariant, DeepVariantWES, DeepVariantWES_MED_DP, DeepVariant_unfiltered]
snv.glnexus.WGS.preset	DeepVariantWGS	for details, see here. allowed values: [DeepVariant, DeepVariantWGS, DeepVariant_unfiltered]
snv.whatshap.output_read_list		Write reads that have been used for phasing to FILE.
snv.whatshap.algorithm	whatshap	Phasing algorithm to use allowed values: [whatshap,hapchat,heuristic]
snv.whatshap.merge_reads	Merge reads which are likely to come from the same haplotype
snv.whatshap.row_limit	256	For the heuristic: Specifies the maximum number of memorized intermediate solutions. Larger values increase runtime and memory consumption, but can improve phasing quality.
snv.whatshap.internal_downsampling	15	Coverage reduction parameter in the internal core phasing algorithm. Higher values increase runtime exponentially while possibly improving phasing quality marginally. Avoid using this in the normal case!
snv.whatshap.mapping_quality	20	Minimum mapping quality
snv.whatshap.only_snvs		Phase only SNVs
snv.whatshap.ignore_read_groups		Ignore read groups in BAM/CRAM header and assume all reads come from the same sample.
snv.whatshap.error_rate	0.15	The probability that a nucleotide is wrong in read merging model.
snv.whatshap.maximum_error_rate	0.25	The maximum error rate of any edge of the read merging graph before discarding it.
snv.whatshap.threshold	1000000	The threshold of the ratio between the probabilities that a pair of reads come from the same haplotype and different haplotypes in the read merging model.
snv.whatshap.negative_threshold	1000	The threshold of the ratio between the probabilities that a pair of reads come from different haplotypes and the same haplotype in the read merging model.
snv.whatshap.distrust_genotypes		Allow switching variants from hetero- to homozygous in an optimal solution (see documentation).
snv.whatshap.include_homozygous		Also work on homozygous variants, which might be turned to heterozygous
snv.whatshap.default_gq	30	Default genotype quality used as cost of changing a genotype when no genotype likelihoods are available
snv.whatshap.gl_regularizer	None	Constant (float) to be used to regularize genotype likelihoods read from input VCF.
snv.whatshap.changed_genotype_list		Write list of changed genotypes to FILE.
snv.whatshap.recombination_list		Write putative recombination events to FILE.
snv.whatshap.recombrate	1.26	Recombination rate in cM/Mb (used with --ped). If given, a constant recombination rate is assumed.
snv.whatshap.genmap		File with genetic map (used with --ped) to be used instead of constant recombination rate, i.e. overrides option --recombrate.
snv.whatshap.no_genetic_haplotyping		Do not merge blocks that are not connected by reads (i.e. solely based on genotype status). Default: when in --ped mode, merge all blocks that contain at least one homozygous genotype in at least one individual into one block.
snv.whatshap.use_ped_samples		Only work on samples mentioned in the provided PED file.
snv.whatshap.use_supplementary		Use also supplementary alignments (default: ignore supplementary_ alignments)
snv.whatshap.supplementary_distance	100000	Skip supplementary alignments further than DIST bp away from the primary alignment
str.expansionhunter.aligner	dag-aligner	for details, see here. allowed values: [dag-aligner, path-aligner]
str.expansionhunter.analysis_mode	streaming	for details, see here. allowed values: [seeking , streaming]
str.expansionhunter.log_level	warn	for details, see here. allowed values: [trace, debug, info, warn, or error]
str.expansionhunter.region_extension_length	1000	for details, see here
str.expansionhunter.GRCh38.variant_catalog	installed	for details, see here
str.straglr.min_support	2	minimum number of support reads for an expansion to be captured in genome-scan, see here
str.straglr.min_cluster_size	2	minimum number of reads required to constitute a cluster (allele) in GMM clustering, see here
str.straglr.GRCh38.loci	installed	from here
sv.cutesv.batches	10000000	Batch of genome segmentation interval
sv.cutesv.gt_round	500	Maximum round of iteration for alignments searching if perform genotyping
sv.cutesv.include_bed		Only detect SVs in regions in the BED file
sv.cutesv.ivcf		Enable to perform force calling using the given vcf file
sv.cutesv.max_size	100000	Maximum size of SV to be reported. All SVs are reported when using -1
sv.cutesv.max_split_parts	7	Maximum number of split segments a read may be aligned before it is ignored. All split segments are considered when using -1. (Recommand -1 when applying assembly-based alignment.)
sv.cutesv.merge_del_threshold	0	Maximum distance of deletion signals to be merged
sv.cutesv.merge_ins_threshold	100	Maximum distance of insertion signals to be merged
sv.cutesv.min_mapq	20	Minimum mapping quality value of alignment to be taken into account (recommend 10 for force calling)
sv.cutesv.min_read_len	500	Ignores reads that only report alignments with not longer than bp
sv.cutesv.min_siglength	10	Minimum length of SV signal to be extracted
sv.cutesv.min_size	30	Minimum size of SV to be reported
sv.cutesv.min_support	2	Minimum number of reads that support a SV to be reported. Please note that the default is lower than the default of cuteSV itself to prevent missed SV calls.
sv.cutesv.read_range	1000	The interval range for counting reads distribution
sv.cutesv.report_readid	false	Enable to report supporting read ids for each SV
sv.cutesv.retain_work_dir	false	Enable to retain temporary folder and files
sv.cutesv.write_old_sigs	false	Enable to output temporary sig files
sv.cutesv.nanopore.diff_ratio_filtering_TRA	0.6	Filter breakpoints with basepair identity less than for translocation
sv.cutesv.nanopore.diff_ratio_merging_DEL	0.3	Do not merge breakpoints with basepair identity more than for deletion
sv.cutesv.nanopore.diff_ratio_merging_INS	0.3	Do not merge breakpoints with basepair identity more than for insertion
sv.cutesv.nanopore.max_cluster_bias_DEL	100	Maximum distance to cluster read together for deletion
sv.cutesv.nanopore.max_cluster_bias_DUP	500	Maximum distance to cluster read together for duplication
sv.cutesv.nanopore.max_cluster_bias_INS	100	Maximum distance to cluster read together for insertion
sv.cutesv.nanopore.max_cluster_bias_INV	500	Maximum distance to cluster read together for inversion
sv.cutesv.nanopore.max_cluster_bias_TRA	50	Maximum distance to cluster read together for translocation
sv.cutesv.nanopore.remain_reads_ratio	1.0	The ratio of reads remained in cluster. Set lower when the alignment data have high quality but recommand over 0.5
sv.cutesv.pacbio_hifi.diff_ratio_filtering_TRA	0.6	Filter breakpoints with basepair identity less than for translocation
sv.cutesv.pacbio_hifi.diff_ratio_merging_DEL	0.5	Do not merge breakpoints with basepair identity more than for deletion
sv.cutesv.pacbio_hifi.diff_ratio_merging_INS	0.9	Do not merge breakpoints with basepair identity more than for insertion
sv.cutesv.pacbio_hifi.max_cluster_bias_DEL	1000	Maximum distance to cluster read together for deletion
sv.cutesv.pacbio_hifi.max_cluster_bias_DUP	500	Maximum distance to cluster read together for duplication
sv.cutesv.pacbio_hifi.max_cluster_bias_INS	1000	Maximum distance to cluster read together for insertion
sv.cutesv.pacbio_hifi.max_cluster_bias_INV	500	Maximum distance to cluster read together for inversion
sv.cutesv.pacbio_hifi.max_cluster_bias_TRA	50	Maximum distance to cluster read together for translocation
sv.cutesv.pacbio_hifi.remain_reads_ratio	1.0	The ratio of reads remained in cluster. Set lower when the alignment data have high quality but recommand over 0.5

gVCF¶

key	default	description
gvcf.merge_preset	DeepVariant	allowed values: [gatk, gatk_unfiltered, DeepVariant, DeepVariant_unfiltered]

VCF¶

key	default	description
vcf.start		allowed values: [normalize, annotate, classify, filter, inheritance, classify_samples, filter_samples]. for reanalysis this defines from which step to start the workflow
vcf.annotate.annotsv_cache_dir	installed
vcf.annotate.ensembl_gene_mapping	installed
vcf.annotate.vep_buffer_size	1000	for details, see here
vcf.annotate.vep_cache_dir	installed
vcf.annotate.vep_plugin_dir	installed
vcf.annotate.vep_plugin_green_db_enabled	false	enabling is only allowed for academic use, for details see here
vcf.annotate.vep_plugin_hpo	installed
vcf.annotate.vep_plugin_inheritance	installed
vcf.annotate.vep_plugin_spliceai_enabled	true	Using SpliceaAI is not allowed for commercial uses (disabling SpliceAI also disables Capice).
vcf.annotate.vep_plugin_vkgl_mode	1	allowed values: [0=full VKGL, 1=public VKGL]. update `vcf.annotate.GRCh38.vep_plugin_vkgl` accordingly
vcf.annotate.GRCh38.capice_model	installed
vcf.annotate.GRCh38.stranger_catalog	installed	for details, see here
vcf.annotate.GRCh38.vep_custom_phylop	installed
vcf.annotate.GRCh38.vep_plugin_clinvar	installed
vcf.annotate.GRCh38.vep_plugin_gnomad	installed
vcf.annotate.GRCh38.vep_plugin_green_db	installed
vcf.annotate.GRCh38.vep_plugin_spliceai_indel	installed
vcf.annotate.GRCh38.vep_plugin_spliceai_snv	installed
vcf.annotate.GRCh38.vep_plugin_utrannotator	installed
vcf.annotate.GRCh38.vep_plugin_vkgl	installed	update `vcf.annotate.vep_plugin_vkgl_mode` accordingly
vcf.classify.annotate_path	1	allowed values: [0=false, 1=true]. annotate variant-consequences with classification tree path
vcf.classify.GRCh38.decision_tree	installed	for details, see here
vcf.classify_samples.annotate_path	1	allowed values: [0=false, 1=true]. annotate variant-consequences per sample with classification tree path
vcf.classify_samples.GRCh38.decision_tree	installed	for details, see here
vcf.filter.classes	VUS,LP,P	for details, see here
vcf.filter.consequences	true	allowed values: [true, false]. true: filter individual consequences, false: keep all consequences for a variant if one consequence filter passes.
vcf.filter_samples.classes	U1,U2	for details, see here
vcf.normalize.allow_invalid_ref	true	throw error or warn when incorrect or missing REF allele is encountered.
vcf.report.gado_genes	installed
vcf.report.gado_hpo	installed
vcf.report.gado_predict_info	installed
vcf.report.gado_predict_matrix	installed
vcf.report.include_crams	true	allowed values: [true, false]. true: include cram files in the report for showing alignments in the genome browser, false: do not include the crams in the report, no aligments are shown in the genome browser. This will result in a smaller report size.
vcf.report.max_records
vcf.report.max_samples
vcf.report.config		`vcf.report.template` configuration file
vcf.report.template		for details, see here
vcf.report.GRCh38.genes	installed

Profiles¶

VIP pre-defines two profiles. The default profile is Slurm with fallback to local in case Slurm cannot be discovered.

key	description
local	for details, see here
slurm	for details, see here

Additional profiles (for details, see here) can be added to your configuration file and used on the command-line, for example to run VIP on the Amazon, Azure or Google Cloud.

Process¶

By default, each process gets assigned 4 cpus, 8GB of memory and a max runtime of 4 hours. Depending on your system specifications and your analysis you might need to use updated values. For information on how to update process configuration see the Nextflow documentation. The following sections list all processes and their non-default configuration.

FASTQ¶

process label	configuration
concat_fastq	default
concat_fastq_paired_end	default
minimap2_align	cpus=8 memory='16GB' time='23h'
minimap2_align_paired_end	cpus=8 memory='16GB' time='23h'

CRAM¶

process label	configuration
concat_vcf	default
coverage	cpus=1 memory='16GB' time=default
cram_validate	default
cutesv_call	cpus=4 memory='8GB' time='5h'
deepvariant_call	cpus=default memory='2GB * cpus' time='5h'
deepvariant_call_duo	cpus=default memory='4GB * cpus' time='23h'
deepvariant_call_trio	cpus=default memory='4GB * cpus' time='23h'
deepvariant_concat_gvcf	cpus=default memory='2GB' time='30m'
deepvariant_concat_vcf	cpus=default memory='2GB' time='30m'
deepvariant_joint_call	cpus=default memory='2GB' time='30m'
expansionhunter_call	cpus=4 memory='16GB' time='5h'
manta_joint_call	cpus=4 memory='8GB' time='5h'
publish_vcf	memory='100MB' time='30m'
spectre_call	cpus=default memory='4GB' time=default
straglr_call	default
vcf_merge_str	default
vcf_merge_sv	default
whatshap	cpus=default memory=default time='23h'

gVCF¶

process label	configuration
gvcf_liftover	default
gvcf_validate	memory='100MB' time='30m'
gvcf_merge	memory='2GB' time='30m'

VCF¶

process label	configuration
vcf_annotate	cpus=4 memory='8GB' time='4h'
vcf_annotate_publish	default
vcf_classify	memory = '2GB'
vcf_classify_publish	default
vcf_classify_samples	memory = '2GB'
vcf_classify_samples_publish	default
vcf_concat	default
vcf_filter	default
vcf_filter_samples	default
vcf_inheritance	memory = '2GB'
vcf_liftover	default
vcf_normalize	default
vcf_report	memory = '4GB'
vcf_slice	default
vcf_split	memory='100MB' time='30m'
vcf_validate	memory='100MB' time='30m'

Environment¶

See https://github.com/molgenis/vip/tree/main/config for an overview of available environment variables. Notably this allows to use different Apptainer containers for the tools that VIP relies on.