Config

The VIP configuration is stored in Nextflow configuration files. An additional configuration file can be supplied on the command-line to overwrite default parameter values, add/update profiles, configure processes and update environment variables.

Tip

VIP enables the Slurm executor if available, configuration options are listed here.

Parameters

key default description
assembly GRCh38 output assembly, allowed values: [GRCh38]
GRCh37.reference.chain.GRCh38 chain file to convert GRCh37 to GRCh38 data
GRCh37.reference.fasta
GRCh37.reference.fastaFai
GRCh37.reference.fastaGzi
GRCh38.reference.fasta installed GCA_000001405.15_GRCh38_no_alt_analysis_set
GRCh38.reference.fastaFai installed
GRCh38.reference.fastaGzi installed
T2T.reference.chain.GRCh38 installed chain file to convert T2T to GRCh38 data
T2T.reference.fasta
T2T.reference.fastaFai
T2T.reference.fastaGzi
pcr_performed false Indication if PCR was performed to get the data, if so certain tools will be disabled due to not being compatible with this data.

Warning: Please take note of the fact that for a different reference fasta.gz the unzipped referenfasta file is also required. Both the zipped and unzipped fasta should have an index.

FASTQ

key default description
GRCh38.reference.fastaMmi installed for details, see here
fastp.options for details, see here
minimap2.soft_clipping true In SAM output, use soft clipping for supplementary alignments (required when STR calling with Straglr)
minimap2.nanopore_preset lr:hq Preset to use for aligning Nanopore data, options: 'lr:hq' 'map-ont'.

CRAM

key default description
cnv.spectre.GRCh38.blacklist installed blacklist in bed format for sites that will be ignored
cnv.spectre.GRCh38.metadata installed metadata file for Ns removal, update this file only when using a different GRCh38 version than the one provided by VIP.
cram.call_snv true enable/disable the detection of short variants
cram.call_str true enable/disable the detection of short tandem repeats
cram.call_sv true enable/disable the detection of structural variants. disable this manually in case of non-paired-end Illumina data.
snv.deeptrio.illumina.WES.model_name WES for details, see here
snv.deeptrio.illumina.WGS.model_name WGS for details, see here
snv.deeptrio.nanopore.model_name ONT for details, see here
snv.deeptrio.pacbio_hifi.model_name PACBIO for details, see here
snv.deepvariant.illumina.WES.model_name WES for details, see here
snv.deepvariant.illumina.WGS.model_name WGS for details, see here
snv.deepvariant.nanopore.model_name ONT_R104 for details, see here
snv.deepvariant.pacbio_hifi.model_name PACBIO for details, see here
snv.glnexus.WES.preset DeepVariantWES for details, see here. allowed values: [DeepVariant, DeepVariantWES, DeepVariantWES_MED_DP, DeepVariant_unfiltered]
snv.glnexus.WGS.preset DeepVariantWGS for details, see here. allowed values: [DeepVariant, DeepVariantWGS, DeepVariant_unfiltered]
snv.whatshap.output_read_list Write reads that have been used for phasing to FILE.
snv.whatshap.algorithm whatshap Phasing algorithm to use allowed values: [whatshap,hapchat,heuristic]
snv.whatshap.merge_reads Merge reads which are likely to come from the same haplotype
snv.whatshap.row_limit 256 For the heuristic: Specifies the maximum number of memorized intermediate solutions. Larger values increase runtime and memory consumption, but can improve phasing quality.
snv.whatshap.internal_downsampling 15 Coverage reduction parameter in the internal core phasing algorithm. Higher values increase runtime exponentially while possibly improving phasing quality marginally. Avoid using this in the normal case!
snv.whatshap.mapping_quality 20 Minimum mapping quality
snv.whatshap.only_snvs Phase only SNVs
snv.whatshap.ignore_read_groups Ignore read groups in BAM/CRAM header and assume all reads come from the same sample.
snv.whatshap.error_rate 0.15 The probability that a nucleotide is wrong in read merging model.
snv.whatshap.maximum_error_rate 0.25 The maximum error rate of any edge of the read merging graph before discarding it.
snv.whatshap.threshold 1000000 The threshold of the ratio between the probabilities that a pair of reads come from the same haplotype and different haplotypes in the read merging model.
snv.whatshap.negative_threshold 1000 The threshold of the ratio between the probabilities that a pair of reads come from different haplotypes and the same haplotype in the read merging model.
snv.whatshap.distrust_genotypes Allow switching variants from hetero- to homozygous in an optimal solution (see documentation).
snv.whatshap.include_homozygous Also work on homozygous variants, which might be turned to heterozygous
snv.whatshap.default_gq 30 Default genotype quality used as cost of changing a genotype when no genotype likelihoods are available
snv.whatshap.gl_regularizer None Constant (float) to be used to regularize genotype likelihoods read from input VCF.
snv.whatshap.changed_genotype_list Write list of changed genotypes to FILE.
snv.whatshap.recombination_list Write putative recombination events to FILE.
snv.whatshap.recombrate 1.26 Recombination rate in cM/Mb (used with --ped). If given, a constant recombination rate is assumed.
snv.whatshap.genmap File with genetic map (used with --ped) to be used instead of constant recombination rate, i.e. overrides option --recombrate.
snv.whatshap.no_genetic_haplotyping Do not merge blocks that are not connected by reads (i.e. solely based on genotype status). Default: when in --ped mode, merge all blocks that contain at least one homozygous genotype in at least one individual into one block.
snv.whatshap.use_ped_samples Only work on samples mentioned in the provided PED file.
snv.whatshap.use_supplementary Use also supplementary alignments (default: ignore supplementary_ alignments)
snv.whatshap.supplementary_distance 100000 Skip supplementary alignments further than DIST bp away from the primary alignment
str.expansionhunter.aligner dag-aligner for details, see here. allowed values: [dag-aligner, path-aligner]
str.expansionhunter.analysis_mode streaming for details, see here. allowed values: [seeking , streaming]
str.expansionhunter.log_level warn for details, see here. allowed values: [trace, debug, info, warn, or error]
str.expansionhunter.region_extension_length 1000 for details, see here
str.expansionhunter.GRCh38.variant_catalog installed for details, see here
str.straglr.min_support 2 minimum number of support reads for an expansion to be captured in genome-scan, see here
str.straglr.min_cluster_size 2 minimum number of reads required to constitute a cluster (allele) in GMM clustering, see here
str.straglr.GRCh38.loci installed from here
sv.cutesv.batches 10000000 Batch of genome segmentation interval
sv.cutesv.gt_round 500 Maximum round of iteration for alignments searching if perform genotyping
sv.cutesv.include_bed Only detect SVs in regions in the BED file
sv.cutesv.ivcf Enable to perform force calling using the given vcf file
sv.cutesv.max_size 100000 Maximum size of SV to be reported. All SVs are reported when using -1
sv.cutesv.max_split_parts 7 Maximum number of split segments a read may be aligned before it is ignored. All split segments are considered when using -1. (Recommand -1 when applying assembly-based alignment.)
sv.cutesv.merge_del_threshold 0 Maximum distance of deletion signals to be merged
sv.cutesv.merge_ins_threshold 100 Maximum distance of insertion signals to be merged
sv.cutesv.min_mapq 20 Minimum mapping quality value of alignment to be taken into account (recommend 10 for force calling)
sv.cutesv.min_read_len 500 Ignores reads that only report alignments with not longer than bp
sv.cutesv.min_siglength 10 Minimum length of SV signal to be extracted
sv.cutesv.min_size 30 Minimum size of SV to be reported
sv.cutesv.min_support 2 Minimum number of reads that support a SV to be reported. Please note that the default is lower than the default of cuteSV itself to prevent missed SV calls.
sv.cutesv.read_range 1000 The interval range for counting reads distribution
sv.cutesv.report_readid false Enable to report supporting read ids for each SV
sv.cutesv.retain_work_dir false Enable to retain temporary folder and files
sv.cutesv.write_old_sigs false Enable to output temporary sig files
sv.cutesv.nanopore.diff_ratio_filtering_TRA 0.6 Filter breakpoints with basepair identity less than for translocation
sv.cutesv.nanopore.diff_ratio_merging_DEL 0.3 Do not merge breakpoints with basepair identity more than for deletion
sv.cutesv.nanopore.diff_ratio_merging_INS 0.3 Do not merge breakpoints with basepair identity more than for insertion
sv.cutesv.nanopore.max_cluster_bias_DEL 100 Maximum distance to cluster read together for deletion
sv.cutesv.nanopore.max_cluster_bias_DUP 500 Maximum distance to cluster read together for duplication
sv.cutesv.nanopore.max_cluster_bias_INS 100 Maximum distance to cluster read together for insertion
sv.cutesv.nanopore.max_cluster_bias_INV 500 Maximum distance to cluster read together for inversion
sv.cutesv.nanopore.max_cluster_bias_TRA 50 Maximum distance to cluster read together for translocation
sv.cutesv.nanopore.remain_reads_ratio 1.0 The ratio of reads remained in cluster. Set lower when the alignment data have high quality but recommand over 0.5
sv.cutesv.pacbio_hifi.diff_ratio_filtering_TRA 0.6 Filter breakpoints with basepair identity less than for translocation
sv.cutesv.pacbio_hifi.diff_ratio_merging_DEL 0.5 Do not merge breakpoints with basepair identity more than for deletion
sv.cutesv.pacbio_hifi.diff_ratio_merging_INS 0.9 Do not merge breakpoints with basepair identity more than for insertion
sv.cutesv.pacbio_hifi.max_cluster_bias_DEL 1000 Maximum distance to cluster read together for deletion
sv.cutesv.pacbio_hifi.max_cluster_bias_DUP 500 Maximum distance to cluster read together for duplication
sv.cutesv.pacbio_hifi.max_cluster_bias_INS 1000 Maximum distance to cluster read together for insertion
sv.cutesv.pacbio_hifi.max_cluster_bias_INV 500 Maximum distance to cluster read together for inversion
sv.cutesv.pacbio_hifi.max_cluster_bias_TRA 50 Maximum distance to cluster read together for translocation
sv.cutesv.pacbio_hifi.remain_reads_ratio 1.0 The ratio of reads remained in cluster. Set lower when the alignment data have high quality but recommand over 0.5

gVCF

key default description
gvcf.merge_preset DeepVariant allowed values: [gatk, gatk_unfiltered, DeepVariant, DeepVariant_unfiltered]

VCF

key default description
vcf.start allowed values: [normalize, annotate, classify, filter, inheritance, classify_samples, filter_samples]. for reanalysis this defines from which step to start the workflow
vcf.annotate.annotsv_cache_dir installed
vcf.annotate.ensembl_gene_mapping installed
vcf.annotate.vep_buffer_size 1000 for details, see here
vcf.annotate.vep_cache_dir installed
vcf.annotate.vep_plugin_dir installed
vcf.annotate.vep_plugin_hpo installed
vcf.annotate.vep_plugin_inheritance installed
vcf.annotate.vep_plugin_vkgl_mode 1 allowed values: [0=full VKGL, 1=public VKGL]. update vcf.annotate.GRCh38.vep_plugin_vkgl accordingly
vcf.annotate.GRCh38.capice_model installed
vcf.annotate.GRCh38.stranger_catalog installed for details, see here
vcf.annotate.GRCh38.vep_custom_phylop installed
vcf.annotate.GRCh38.vep_plugin_clinvar installed
vcf.annotate.GRCh38.vep_plugin_gnomad installed
vcf.annotate.GRCh38.vep_plugin_green_db_enabled false enabling is only allowed for academic use, for details see here
vcf.annotate.GRCh38.vep_plugin_green_db installed
vcf.annotate.GRCh38.vep_plugin_spliceai_indel installed
vcf.annotate.GRCh38.vep_plugin_spliceai_snv installed
vcf.annotate.GRCh38.vep_plugin_utrannotator installed
vcf.annotate.GRCh38.vep_plugin_vkgl installed update vcf.annotate.vep_plugin_vkgl_mode accordingly
vcf.classify.annotate_path 1 allowed values: [0=false, 1=true]. annotate variant-consequences with classification tree path
vcf.classify.GRCh38.decision_tree installed for details, see here
vcf.classify_samples.annotate_path 1 allowed values: [0=false, 1=true]. annotate variant-consequences per sample with classification tree path
vcf.classify_samples.GRCh38.decision_tree installed for details, see here
vcf.filter.classes VUS,LP,P for details, see here
vcf.filter.consequences true allowed values: [true, false]. true: filter individual consequences, false: keep all consequences for a variant if one consequence filter passes.
vcf.filter_samples.classes U1,U2 for details, see here
vcf.report.gado_genes installed
vcf.report.gado_hpo installed
vcf.report.gado_predict_info installed
vcf.report.gado_predict_matrix installed
vcf.report.include_crams true allowed values: [true, false]. true: include cram files in the report for showing alignments in the genome browser, false: do not include the crams in the report, no aligments are shown in the genome browser. This will result in a smaller report size.
vcf.report.max_records
vcf.report.max_samples
vcf.report.config vcf.report.template configuration file
vcf.report.template for details, see here
vcf.report.GRCh38.genes installed

Profiles

VIP pre-defines two profiles. The default profile is Slurm with fallback to local in case Slurm cannot be discovered.

key description
local for details, see here
slurm for details, see here

Additional profiles (for details, see here) can be added to your configuration file and used on the command-line, for example to run VIP on the Amazon, Azure or Google Cloud.

Process

By default, each process gets assigned 4 cpus, 8GB of memory and a max runtime of 4 hours. Depending on your system specifications and your analysis you might need to use updated values. For information on how to update process configuration see the Nextflow documentation. The following sections list all processes and their non-default configuration.

FASTQ

process label configuration
concat_fastq default
concat_fastq_paired_end default
minimap2_align cpus=8 memory='16GB' time='23h'
minimap2_align_paired_end cpus=8 memory='16GB' time='23h'

CRAM

process label configuration
concat_vcf default
coverage cpus=1 memory='16GB' time=default
cram_validate default
cutesv_call cpus=4 memory='8GB' time='5h'
deepvariant_call cpus=default memory='2GB * cpus' time='5h'
deepvariant_call_duo cpus=default memory='4GB * cpus' time='23h'
deepvariant_call_trio cpus=default memory='4GB * cpus' time='23h'
deepvariant_concat_gvcf cpus=default memory='2GB' time='30m'
deepvariant_concat_vcf cpus=default memory='2GB' time='30m'
deepvariant_joint_call cpus=default memory='2GB' time='30m'
expansionhunter_call cpus=4 memory='16GB' time='5h'
manta_joint_call cpus=4 memory='8GB' time='5h'
spectre_call cpus=default memory='4GB' time=default
straglr_call default
vcf_merge_str default
vcf_merge_sv default
whatshap cpus=default memory=default time='23h'

gVCF

process label configuration
gvcf_liftover default
gvcf_validate memory='100MB' time='30m'
gvcf_merge memory='2GB' time='30m'

VCF

process label configuration
vcf_annotate cpus=4 memory='8GB' time='4h'
vcf_annotate_publish default
vcf_classify memory = '2GB'
vcf_classify_publish default
vcf_classify_samples memory = '2GB'
vcf_classify_samples_publish default
vcf_concat default
vcf_filter default
vcf_filter_samples default
vcf_inheritance memory = '2GB'
vcf_liftover default
vcf_normalize default
vcf_report memory = '4GB'
vcf_slice default
vcf_split memory='100MB' time='30m'
vcf_validate memory='100MB' time='30m'

Environment

See https://github.com/molgenis/vip/tree/main/config for an overview of available environment variables. Notably this allows to use different Apptainer containers for the tools that VIP relies on.