Config

The VIP configuration is stored in Nextflow configuration files. An additional configuration file can be supplied on the command-line to overwrite default parameter values, add/update profiles, configure processes and update environment variables.

Parameters

key default description
assembly GRCh38 output assembly, allowed values: [GRCh38]
GRCh37.reference.chain.GRCh38 installed chain file to convert GRCh37 to GRCh38 data
GRCh37.reference.fasta installed
GRCh37.reference.fastaFai installed
GRCh37.reference.fastaGzi installed
GRCh38.reference.fasta installed GCA_000001405.15_GRCh38_no_alt_analysis_set
GRCh38.reference.fastaFai installed
GRCh38.reference.fastaGzi installed
T2T.reference.chain.GRCh38 installed chain file to convert T2T to GRCh38 data
T2T.reference.fasta
T2T.reference.fastaFai
T2T.reference.fastaGzi
pcr_performed false Indication if PCR was performed to get the data, if so certain tools will be disabled due to not being compatible with this data.

Warning: Please take note of the fact that for a different reference fasta.gz the unzipped referenfasta file is also required. Both the zipped and unzipped fasta should have an index.

FASTQ

key default description
GRCh38.reference.fastaMmi installed for details, see here
fastp.options for details, see here
minimap2.soft_clipping true In SAM output, use soft clipping for supplementary alignments (required when STR calling with Straglr)
minimap2.nanopore_preset lr:hq Preset to use for aligning Nanopore data, options: 'lr:hq' 'map-ont'.

CRAM

key default description
cnv.spectre.GRCh38.blacklist installed blacklist in bed format for sites that will be ignored
cnv.spectre.GRCh38.metadata installed metadata file for Ns removal, update this file only when using a different GRCh38 version than the one provided by VIP.
cram.call_snv true enable/disable the detection of short variants
cram.call_str true enable/disable the detection of short tandem repeats
cram.call_sv true enable/disable the detection of structural variants. disable this manually in case of non-paired-end Illumina data.
snv.deeptrio.illumina.WES.model_name WES for details, see here
snv.deeptrio.illumina.WGS.model_name WGS for details, see here
snv.deeptrio.nanopore.model_name ONT for details, see here
snv.deeptrio.pacbio_hifi.model_name PACBIO for details, see here
snv.deepvariant.illumina.WES.model_name WES for details, see here
snv.deepvariant.illumina.WGS.model_name WGS for details, see here
snv.deepvariant.nanopore.model_name ONT_R104 for details, see here
snv.deepvariant.pacbio_hifi.model_name PACBIO for details, see here
snv.glnexus.WES.preset DeepVariantWES for details, see here. allowed values: [DeepVariant, DeepVariantWES, DeepVariantWES_MED_DP, DeepVariant_unfiltered]
snv.glnexus.WGS.preset DeepVariantWGS for details, see here. allowed values: [DeepVariant, DeepVariantWGS, DeepVariant_unfiltered]
str.expansionhunter.aligner dag-aligner for details, see here. allowed values: [dag-aligner, path-aligner]
str.expansionhunter.analysis_mode streaming for details, see here. allowed values: [seeking , streaming]
str.expansionhunter.log_level warn for details, see here. allowed values: [trace, debug, info, warn, or error]
str.expansionhunter.region_extension_length 1000 for details, see here
str.expansionhunter.GRCh38.variant_catalog installed for details, see here
str.straglr.min_support 2 minimum number of support reads for an expansion to be captured in genome-scan, see here
str.straglr.min_cluster_size 2 minimum number of reads required to constitute a cluster (allele) in GMM clustering, see here
str.straglr.GRCh38.loci installed from here
sv.cutesv.batches 10000000 Batch of genome segmentation interval
sv.cutesv.gt_round 500 Maximum round of iteration for alignments searching if perform genotyping
sv.cutesv.include_bed Only detect SVs in regions in the BED file
sv.cutesv.ivcf Enable to perform force calling using the given vcf file
sv.cutesv.max_size 100000 Maximum size of SV to be reported. All SVs are reported when using -1
sv.cutesv.max_split_parts 7 Maximum number of split segments a read may be aligned before it is ignored. All split segments are considered when using -1. (Recommand -1 when applying assembly-based alignment.)
sv.cutesv.merge_del_threshold 0 Maximum distance of deletion signals to be merged
sv.cutesv.merge_ins_threshold 100 Maximum distance of insertion signals to be merged
sv.cutesv.min_mapq 20 Minimum mapping quality value of alignment to be taken into account (recommend 10 for force calling)
sv.cutesv.min_read_len 500 Ignores reads that only report alignments with not longer than bp
sv.cutesv.min_siglength 10 Minimum length of SV signal to be extracted
sv.cutesv.min_size 30 Minimum size of SV to be reported
sv.cutesv.min_support 10 Minimum number of reads that support a SV to be reported
sv.cutesv.read_range 1000 The interval range for counting reads distribution
sv.cutesv.report_readid false Enable to report supporting read ids for each SV
sv.cutesv.retain_work_dir false Enable to retain temporary folder and files
sv.cutesv.write_old_sigs false Enable to output temporary sig files
sv.cutesv.nanopore.diff_ratio_filtering_TRA 0.6 Filter breakpoints with basepair identity less than for translocation
sv.cutesv.nanopore.diff_ratio_merging_DEL 0.3 Do not merge breakpoints with basepair identity more than for deletion
sv.cutesv.nanopore.diff_ratio_merging_INS 0.3 Do not merge breakpoints with basepair identity more than for insertion
sv.cutesv.nanopore.max_cluster_bias_DEL 100 Maximum distance to cluster read together for deletion
sv.cutesv.nanopore.max_cluster_bias_DUP 500 Maximum distance to cluster read together for duplication
sv.cutesv.nanopore.max_cluster_bias_INS 100 Maximum distance to cluster read together for insertion
sv.cutesv.nanopore.max_cluster_bias_INV 500 Maximum distance to cluster read together for inversion
sv.cutesv.nanopore.max_cluster_bias_TRA 50 Maximum distance to cluster read together for translocation
sv.cutesv.nanopore.remain_reads_ratio 1.0 The ratio of reads remained in cluster. Set lower when the alignment data have high quality but recommand over 0.5
sv.cutesv.pacbio_hifi.diff_ratio_filtering_TRA 0.6 Filter breakpoints with basepair identity less than for translocation
sv.cutesv.pacbio_hifi.diff_ratio_merging_DEL 0.5 Do not merge breakpoints with basepair identity more than for deletion
sv.cutesv.pacbio_hifi.diff_ratio_merging_INS 0.9 Do not merge breakpoints with basepair identity more than for insertion
sv.cutesv.pacbio_hifi.max_cluster_bias_DEL 1000 Maximum distance to cluster read together for deletion
sv.cutesv.pacbio_hifi.max_cluster_bias_DUP 500 Maximum distance to cluster read together for duplication
sv.cutesv.pacbio_hifi.max_cluster_bias_INS 1000 Maximum distance to cluster read together for insertion
sv.cutesv.pacbio_hifi.max_cluster_bias_INV 500 Maximum distance to cluster read together for inversion
sv.cutesv.pacbio_hifi.max_cluster_bias_TRA 50 Maximum distance to cluster read together for translocation
sv.cutesv.pacbio_hifi.remain_reads_ratio 1.0 The ratio of reads remained in cluster. Set lower when the alignment data have high quality but recommand over 0.5

gVCF

key default description
gvcf.merge_preset DeepVariant allowed values: [gatk, gatk_unfiltered, DeepVariant, DeepVariant_unfiltered]

VCF

key default description
vcf.start allowed values: [normalize, annotate, classify, filter, inheritance, classify_samples, filter_samples]. for reanalysis this defines from which step to start the workflow
vcf.annotate.annotsv_cache_dir installed
vcf.annotate.ensembl_gene_mapping installed
vcf.annotate.vep_buffer_size 1000 for details, see here
vcf.annotate.vep_cache_dir installed
vcf.annotate.vep_plugin_dir installed
vcf.annotate.vep_plugin_hpo installed
vcf.annotate.vep_plugin_inheritance installed
vcf.annotate.vep_plugin_vkgl_mode 1 allowed values: [0=full VKGL, 1=public VKGL]. update vcf.annotate.GRCh38.vep_plugin_vkgl accordingly
vcf.annotate.GRCh38.capice_model installed
vcf.annotate.GRCh38.vep_custom_phylop installed
vcf.annotate.GRCh38.vep_plugin_clinvar installed
vcf.annotate.GRCh38.vep_plugin_gnomad installed
vcf.annotate.GRCh38.vep_plugin_green_db_enabled false enabling is only allowed for academic use, for details see here
vcf.annotate.GRCh38.vep_plugin_green_db installed
vcf.annotate.GRCh38.vep_plugin_spliceai_indel installed
vcf.annotate.GRCh38.vep_plugin_spliceai_snv installed
vcf.annotate.GRCh38.vep_plugin_utrannotator installed
vcf.annotate.GRCh38.vep_plugin_vkgl installed update vcf.annotate.vep_plugin_vkgl_mode accordingly
vcf.classify.annotate_path 1 allowed values: [0=false, 1=true]. annotate variant-consequences with classification tree path
vcf.classify.GRCh38.decision_tree installed for details, see here
vcf.classify_samples.annotate_path 1 allowed values: [0=false, 1=true]. annotate variant-consequences per sample with classification tree path
vcf.classify_samples.GRCh38.decision_tree installed for details, see here
vcf.filter.classes VUS,LP,P for details, see here
vcf.filter.consequences true allowed values: [true, false]. true: filter individual consequences, false: keep all consequences for a variant if one consequence filter passes.
vcf.filter_samples.classes U1,U2 for details, see here
vcf.report.gado_genes installed
vcf.report.gado_hpo installed
vcf.report.gado_predict_info installed
vcf.report.gado_predict_matrix installed
vcf.report.include_crams true allowed values: [true, false]. true: include cram files in the report for showing alignments in the genome browser, false: do not include the crams in the report, no aligments are shown in the genome browser. This will result in a smaller report size.
vcf.report.max_records
vcf.report.max_samples
vcf.report.template for details, see here
vcf.report.GRCh38.genes installed

Profiles

VIP pre-defines two profiles. The default profile is Slurm with fallback to local in case Slurm cannot be discovered.

key description
local for details, see here
slurm for details, see here

Additional profiles (for details, see here) can be added to your configuration file and used on the command-line, for example to run VIP on the Amazon, Azure or Google Cloud.

Process

By default, each process gets assigned 4 cpus, 8GB of memory and a max runtime of 4 hours. Depending on your system specifications and your analysis you might need to use updated values. For information on how to update process configuration see the Nextflow documentation. The following sections list all processes and their non-default configuration.

FASTQ

process label configuration
concat_fastq default
concat_fastq_paired_end default
minimap2_align cpus=8 memory='16GB' time='23h'
minimap2_align_paired_end cpus=8 memory='16GB' time='23h'

CRAM

process label configuration
concat_vcf default
cram_validate default
cutesv_call cpus=4 memory='8GB' time='5h'
deepvariant_call cpus=default memory='2GB * cpus' time='5h'
deepvariant_call_duo cpus=default memory='4GB * cpus' time='5h'
deepvariant_call_trio cpus=default memory='4GB * cpus' time='5h'
deepvariant_concat_gvcf cpus=default memory='2GB' time='30m'
deepvariant_concat_vcf cpus=default memory='2GB' time='30m'
deepvariant_joint_call cpus=default memory='2GB' time='30m'
expansionhunter_call cpus=4 memory='16GB' time='5h'
manta_joint_call cpus=4 memory='8GB' time='5h'
straglr_call default
vcf_merge_str default
vcf_merge_sv default

gVCF

process label configuration
gvcf_liftover default
gvcf_validate memory='100MB' time='30m'
gvcf_merge memory='2GB' time='30m'

VCF

process label configuration
vcf_annotate cpus=4 memory='8GB' time='4h'
vcf_annotate_publish default
vcf_classify memory = '2GB'
vcf_classify_publish default
vcf_classify_samples memory = '2GB'
vcf_classify_samples_publish default
vcf_concat default
vcf_filter default
vcf_filter_samples default
vcf_inheritance memory = '2GB'
vcf_liftover default
vcf_normalize default
vcf_report memory = '4GB'
vcf_slice default
vcf_split memory='100MB' time='30m'
vcf_validate memory='100MB' time='30m'

Environment

See https://github.com/molgenis/vip/tree/main/config for an overview of available environment variables. Notably this allows to use different Apptainer containers for the tools that VIP relies on.