Annotations¶

VIP annotates variant effects and genotype data for samples using a rich set of tools. Annotions can be used to classify variants using classification trees and displayed in reports.

Overview¶

The table contains annotations available in most output files. Depending on the workflow and the configuration used additional annotations might be available, check the output file headers for the complete overview. Similarly, some annotations listed below might be missing from your output file depending on the sample sheet content and configuration.

Filter¶

annotation	value	source	description
FILTER	MONOALLELIC	GLnexus	Site represents one ALT allele in a region with multiple variants that could not be unified into non-overlapping multi-allelic sites
FILTER	PASS	Straglr	All filters passed
FILTER	q5	cuteSV	Quality below 5
FILTER	UNRESOLVED	Spectre	An insertion that is longer than the read and thus we cannot predict the full size

Format¶

annotation	type	source	description
FORMAT/CD	float list	Spectre	Coverage read depth
FORMAT/DP	integer		Approximate read depth (reads with MQ=255 or with bad mates are filtered)
FORMAT/DR	integer	cuteSV	# High-quality reference reads
FORMAT/DV	integer	cuteSV	# High-quality variant reads
FORMAT/GQ	integer	cuteSV, Spectre	Genotype Quality
FORMAT/GT	string	cuteSV, Spectre, straglr-tsv2vcf	Genotype
FORMAT/HO	float list	Spectre	Homozygosity proportion
FORMAT/ID	string	Spectre	Population ID of supporting CNV calls
FORMAT/LC	integer	straglr-tsv2vcf	Locus coverage
FORMAT/PL	integer list	cuteSV	# Phred-scaled genotype likelihoods rounded to the closest integer
FORMAT/PS	integer	Whatshap	Phase set identifier
FORMAT/RNC	character list		Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele
FORMAT/RU_CALL	string	straglr-tsv2vcf	Most frequent actual repeat motif
FORMAT/RU_CI	string	straglr-tsv2vcf	95% confidence interval per allele. 'NA' if less than 2 reads were present.
FORMAT/RU_MATCH	integer	straglr-tsv2vcf	RU call matches catalog (allowing shift/IUPAC, 1= match, 0 = mismatch
FORMAT/RU_NR	string	straglr-tsv2vcf	Number of repeat units per allele.
FORMAT/RU_SEEN	string	straglr-tsv2vcf	All RUs encountered with read counts
FORMAT/RU_SPAN	integer	straglr-tsv2vcf	Number of spanning reads per allele.
FORMAT/VI	string list	vip-inheritance-matcher	An enumeration of possible inheritance modes (Possible values: AR, AR_C, AD, AD_IP, XLR, XLD)
FORMAT/VIAB	float	annotate process	VIP calculated allele balance
FORMAT/VIC	string	vip-inheritance-matcher	Possible Compound hetrozygote variants
FORMAT/VID	integer	vip-inheritance-matcher	De novo variant
FORMAT/VIG	string list	vip-inheritance-matcher	Genes with an inheritance match
FORMAT/VIM	integer	vip-inheritance-matcher	Inheritance Match: Genotypes, affected statuses and known gene inheritance patterns match
FORMAT/VIPC_S	string list	vip-decision-tree	VIP decision tree classification for sample
FORMAT/VIPP_S	string list	vip-decision-tree	VIP decision tree path for sample

INFO¶

annotation	type	source	description
INFO/AC	integer list		Allele count in genotypes
INFO/AF	float list	cuteSV	Allele Frequency
INFO/AN	integer		Total number of alleles in called genotypes
INFO/AQ	integer list		Allele Quality score reflecting evidence for each alternate allele (Phred scale)
INFO/CHR2	string	cuteSV	Chromosome for END coordinate in case of a translocation
INFO/CILEN	integer list	cuteSV	Confidence interval around inserted/deleted material between breakends
INFO/CIPOS	integer list	cuteSV	Confidence interval around POS for imprecise variants
INFO/CN	integer	Spectre	Estimated copy number status
INFO/Disease	string	Stranger	Associated disorder
INFO/DisplayRU	string	Stranger	Display repeat unit familiar to clinician
INFO/END	integer	cuteSV, Spectre, straglr-tsv2vcf	End position of the structural variant
INFO/HGNCId	integer	Stranger	HGNC gene id for associated disease gene
INFO/IMPRECISE	boolean	cuteSV	Imprecise structural variant
INFO/InheritanceMode	string	Stranger	Main mode of inheritance for disorder
INFO/OLD_REC	string		Original variant. Format: CHR\|POS\|REF\|ALT\|USED_ALT_IDX
INFO/PRECISE	boolean	cuteSV	Precise structural variant
INFO/RankScore	string	Stranger	RankScore for variant in this family as family(str):score(int)
INFO/RE	integer	cuteSV	Number of read support this record
INFO/REPID	string	straglr-tsv2vcf	Locus identifier.
INFO/RU_CAT	string	straglr-tsv2vcf	Catalog repeat motif
INFO/Source	string	Stranger	Source collection for variant definition
INFO/SourceDisplay	string	Stranger	Source for variant definition, display
INFO/SourceId	string	Stranger	Source id for variant definition
INFO/STR_NORMAL_MAX	integer	Stranger	Max number of repeats allowed to call as normal
INFO/STR_PATHOLOGIC_MIN	integer	Stranger	Min number of repeats required to call as pathologic
INFO/STR_STATUS	string list	Stranger	Repeat expansion status. Alternatives in [normal, pre_mutation, full_mutation]
INFO/STRAND	string list	cuteSV	Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)
INFO/SVLEN	integer	cuteSV, Spectre	Length of the SV
INFO/SVSUPPORT	string	Spectre	Indicator if a SV support was found in a provided SNFJ file
INFO/SVTYPE	string	cuteSV, Spectre, straglr-tsv2vcf	Type of copy number variant
INFO/SweGenMean	float	Stranger	Average number of repeat unit copies in population
INFO/SweGenStd	float	Stranger	Standard deviation of number of repeat unit copies in population
INFO/VIPC_S	string list		VIP decision tree classification (samples)

INFO/CSQ¶

annotation	type	source	description
INFO/CSQ/Allele	string	VEP	The variant allele used to calculate the consequence
INFO/CSQ/ALLELE_NUM	integer	VEP	Allele nr within the VCF file.
INFO/CSQ/ALPHSCORE	float	VEP plugin	AlphScore pathogenicity score for missense variants (see here)
INFO/CSQ/Amino_acids	string	VEP	Reference and variant amino acids
INFO/CSQ/apogee_Score	float	VEP plugin	APOGEE-2 pathogenicity score for mtDNA variants (used for protein coding genes)
INFO/CSQ/apogee_Uscore	float	VEP plugin	APOGEE-2 unbiased pathogenicity score mtDNA variants (used for protein coding genes)
INFO/CSQ/ASV_ACMG_class	string	VEP plugin	AnnotSv 'ACMG_class' output
INFO/CSQ/ASV_AnnotSV_ranking_criteria	string	VEP plugin	AnnotSv 'AnnotSV_ranking_criteria' output
INFO/CSQ/ASV_AnnotSV_ranking_score	string	VEP plugin	AnnotSv 'AnnotSV_ranking_score' output
INFO/CSQ/BIOTYPE	string	VEP	Biotype of transcript or regulatory feature
INFO/CSQ/CAPICE_CL	categorical	VEP plugin	CAPICE classification (see here). Categories: B, LB, VUS, LP, P
INFO/CSQ/CAPICE_SC	float	VEP plugin	CAPICE score
INFO/CSQ/cDNA_position	string	VEP	Position within the cDNA
INFO/CSQ/CDS_position	string	VEP	Position within the coding sequence
INFO/CSQ/CHECK_REF	string	VEP	Reports variants where the input reference does not match the expected reference
INFO/CSQ/CLIN_SIG	string list	VEP	ClinVar classification(s) (do not use, see here)
INFO/CSQ/clinVar_CLNID	integer list	VEP plugin	ClinVar variation identifier
INFO/CSQ/clinVar_CLNREVSTAT	categorical list	VEP plugin	ClinVar review status for the Variation ID. Categories: practice_guideline, reviewed_by_expert_panel, criteria_provided, _multiple_submitters, _no_conflicts, _single_submitter, _conflicting_interpretations, no_assertion_criteria_provided, no_assertion_provided
INFO/CSQ/clinVar_CLNSIG	string	VEP plugin	Clinical significance for this single variant; multiple values are separated by a vertical bar. Categories: Benign, Likely_benign, Uncertain_significance, Likely_pathogenic, Pathogenic, Conflicting_classifications_of_pathogenicity, Other
INFO/CSQ/clinVar_CLNSIGINCL	string	VEP plugin	Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance; multiple values are separated by a vertical bar. Categories: Benign, Likely_benign, Uncertain_significance, Likely_pathogenic, Pathogenic, Conflicting_interpretations_of_pathogenicity
INFO/CSQ/Codons	string	VEP	Reference and variant codon sequence
INFO/CSQ/Consequence	string list	VEP	Effect(s) described as Sequence Ontology term(s)
INFO/CSQ/DISTANCE	string	VEP	Shortest distance from variant to transcript
INFO/CSQ/existing_InFrame_oORFs	string	VEP plugin	The number of existing inFrame overlapping ORFs (inFrame oORF) at the 5 prime UTR
INFO/CSQ/existing_OutOfFrame_oORFs	string	VEP plugin	The number of existing out-of-frame overlapping ORFs (OutOfFrame oORF) at the 5 prime UTR
INFO/CSQ/existing_uORFs	string	VEP plugin	The number of existing uORFs with a stop codon within the 5 prime UTR
INFO/CSQ/Existing_variation	string list	VEP	Identifier(s) of co-located known variants
INFO/CSQ/EXON	string	VEP	The exon number (out of total number)
INFO/CSQ/Feature	string	VEP	RefSeq ID of feature
INFO/CSQ/Feature_type	categorical	VEP	VEP feature type. Categories: Transcript, RegulatoryFeature, MotifFeature
INFO/CSQ/FATHMM_MKL_NC	float	VEP plugin	The FATHMM-MKL score for Non-Coding Single Nucleotide Variants (SNVs)
INFO/CSQ/five_prime_UTR_variant_annotation	string	VEP plugin	Output the annotation of a given 5 prime UTR variant
INFO/CSQ/five_prime_UTR_variant_consequence	string	VEP plugin	Output the variant consequences of a given 5 prime UTR variant: uAUG_gained, uAUG_lost, uSTOP_lost or uFrameshift
INFO/CSQ/FLAGS	string list	VEP	Transcript quality flags (cds_start_NF: CDS 5' incomplete, cds_end_NF: CDS 3' incomplete)
INFO/CSQ/GADO_PD	categorical	VEP plugin	GADO prediction for the relation between the HPO terms of the proband(s) and the gene, HC: high confidence, LC: low confidence. Categories: LC, HC
INFO/CSQ/GADO_SC	float	VEP plugin	The combined prioritization GADO Z-score over the HPO of the proband(s) terms for this case
INFO/CSQ/Gene	string	VEP	NCBI Gene ID of affected gene
INFO/CSQ/gnomAD_COV	float	VEP plugin	gnomAD coverage (percent of individuals in gnomAD source)
INFO/CSQ/gnomAD_AF	float	VEP plugin	gnomAD allele frequency
INFO/CSQ/gnomAD_FAF95	float	VEP plugin	gnomAD filter allele frequency (95% confidence)
INFO/CSQ/gnomAD_FAF99	float	VEP plugin	gnomAD filter allele frequency (99% confidence)
INFO/CSQ/gnomAD_HN	integer	VEP plugin	gnomAD number of homozygotes
INFO/CSQ/gnomAD_QC	string list	VEP plugin	gnomAD quality control filters that failed
INFO/CSQ/gnomAD_SRC	categorical	VEP plugin	gnomAD source (E=exomes, G=genomes, T=total)
INFO/CSQ/Grantham	string	VEP plugin	Grantham Matrix score - Grantham, R. Amino Acid Difference Formula to Help Explain Protein Evolution, Science 1974 Sep 6;185(4154):862-4
INFO/CSQ/HGNC_ID	integer	VEP	HGNC gene identifier
INFO/CSQ/HGVS_OFFSET	string	VEP	Indicates by how many bases the HGVS notations for this variant have been shifted
INFO/CSQ/HGVSc	string	VEP	HGVS nomenclature: coding DNA reference sequence
INFO/CSQ/HGVSp	string	VEP	HGVS nomenclature: protein reference sequence
INFO/CSQ/HIGH_INF_POS	string	VEP	A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP)
INFO/CSQ/hmtvar_DiseaseScore	float	VEP plugin	HmtVar disease score for mtDNA variants (used of tRNA genes)
INFO/CSQ/HPO	string list	VEP plugin	Human phenotype ontology term that match
INFO/CSQ/IMPACT	categorical	VEP	Impact as predicted by VEP. Categories: LOW, MODERATE, HIGH, MODIFIER
INFO/CSQ/IncompletePenetrance	string	VEP plugin	Boolean indicating if the gene is known for incomplete penetrance (1:true)
INFO/CSQ/InheritanceModesGene	string list	VEP plugin	List of inheritance modes for the gene
INFO/CSQ/INTRON	string	VEP	The intron number (out of total number)
INFO/CSQ/mitoTip_Score	float	VEP plugin	MitoTIP pathogenicity score for mtDNA variants (used for tRNA genes)
INFO/CSQ/mitoTip_Quartile	string	VEP plugin	MitoTIP pathogenicity score quartile
INFO/CSQ/MOTIF_NAME	string	VEP	The source and identifier of a transcription factor binding profile aligned at this position
INFO/CSQ/MOTIF_POS	string	VEP	The relative position of the variation in the aligned TFBP
INFO/CSQ/MOTIF_SCORE_CHANGE	string	VEP	The difference in motif score of the reference and variant sequences for the TFBP
INFO/CSQ/ncER	float	VEP plugin	The non-coding essential regulation (ncER) score indicates if a region is likely to be essential in terms of regulation.
INFO/CSQ/PHENO	integer list	VEP	Indicates if existing variant is associated with a phenotype, disease or trait; multiple values correspond to multiple values in the Existing_variation field
INFO/CSQ/phyloP	string	VEP custom	Conservation p-values, see here
INFO/CSQ/PICK	integer	VEP	Boolean indicating if this is the VEP picked transcript
INFO/CSQ/PolyPhen	float	VEP	PolyPhen score
INFO/CSQ/Protein_position	string	VEP	Position within the protein
INFO/CSQ/PUBMED	integer list	VEP	PubMed citations
INFO/CSQ/REFSEQ_MATCH	string	VEP	Flag indicating whether and how the RefSeq model differs from the underlying genome
INFO/CSQ/REFSEQ_OFFSET	string	VEP	?
INFO/CSQ/ReMM	float	VEP plugin	The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations in the human genome in terms of Mendelian diseases.
INFO/CSQ/SIFT	float	VEP	SIFT score
INFO/CSQ/SOMATIC	integer list	VEP	Somatic status of existing variant(s); multiple values correspond to multiple values in the Existing_variation field
INFO/CSQ/SOURCE	string	VEP	?
INFO/CSQ/SpliceAI_pred_DP_AG	float	VEP plugin	SpliceAI predicted effect on splicing. Delta position for acceptor gain
INFO/CSQ/SpliceAI_pred_DP_AL	float	VEP plugin	SpliceAI predicted effect on splicing. Delta position for acceptor loss
INFO/CSQ/SpliceAI_pred_DP_DG	float	VEP plugin	SpliceAI predicted effect on splicing. Delta position for donor gain
INFO/CSQ/SpliceAI_pred_DP_DL	float	VEP plugin	SpliceAI predicted effect on splicing. Delta position for donor loss
INFO/CSQ/SpliceAI_pred_DS_AG	float	VEP plugin	SpliceAI predicted effect on splicing. Delta score for acceptor gain
INFO/CSQ/SpliceAI_pred_DS_AL	float	VEP plugin	SpliceAI predicted effect on splicing. Delta score for acceptor loss
INFO/CSQ/SpliceAI_pred_DS_DG	float	VEP plugin	SpliceAI predicted effect on splicing. Delta score for donor gain
INFO/CSQ/SpliceAI_pred_DS_DL	float	VEP plugin	SpliceAI predicted effect on splicing. Delta score for donor loss
INFO/CSQ/SpliceAI_pred_SYMBOL	string	VEP plugin	SpliceAI gene symbol
INFO/CSQ/STRAND	string	VEP	The DNA strand (1 or -1) on which the transcript/feature lies
INFO/CSQ/SYMBOL	string	VEP	Gene symbol
INFO/CSQ/SYMBOL_SOURCE	string	VEP	The source of the gene symbol
INFO/CSQ/TRANSCRIPTION_FACTORS	string	VEP	?
INFO/CSQ/VIPC	string	vip-decision-tree	VIP decision tree classification for variant effect
INFO/CSQ/VIPP	string list	vip-decision-tree	VIP decision tree path for variant effect
INFO/CSQ/VKGL	string	VEP plugin	?
INFO/CSQ/VKGL_CL	string	VEP plugin	VKGL consensus variant classification

Details¶

VIP uses the Ensemble Effect Predictor to annotate all variants with their consequences. We use VEP with the refseq option for the transcripts, and with the flags for sift and polyphen annotations enabled.

Plugins¶

Below we describe the other sources which we annotate using the VEP plugin framework.

CAPICE¶

CAPICE is a computational method for predicting the pathogenicity of SNVs and InDels. It is a gradient boosting tree model trained using a variety of genomic annotations used by CADD score and trained on the clinical significance. CAPICE performs consistently across diverse independent synthetic, and real clinical data sets. It ourperforms the current best method in pathogenicity estimation for variants of different molecular consequences and allele frequency.

We run the CAPICE application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the CAPICE output file.

VKGL¶

The datashare workgroup of VKGL has set up a central database to enable mutual sharing of variant classifications through a partly automatic process. An additional goal is the public sharing of these data. The currently publicly available part of the database consists of DNA variant classifications established based on (former) diagnostic questions.

We add the classifications from an export of the database and use a VEP plugin to annotate the VEP output with the classifications from the this file.

SpliceAI¶

SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations.

We add the scores from the available precomputed scores of SpliceAI and use a copy of the available VEP plugin to annotate the VEP output with the classifications from the this file.

AnnotSV¶

AnnotSV is a program for annotating and ranking structural variations from genomes of several organisms.

We run the AnnotSV application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the AnnotSV output file.

HPO¶

A file based on the HPO phenotype_to_genes.txt is used to annotate VEP consequences with the inheritance modes associated with the gene of this consequence.

Inheritance¶

A file based on the CGD database is used to annotate VEP consequences with the inheritance modes associated with the gene of this consequence.

Grantham¶

The Grantham score attempts to predict the distance between two amino acids, in an evolutionary sense. A lower Grantham score reflects less evolutionary distance. A higher Grantham score reflects a greater evolutionary distance.

We use a copy of the VEP plugin by Duarte Molha to annotate the VEP output with Grantham scores.

GADO¶

GADO can be used to prioritize genes based on the HPO terms of a patient..

We run the GADO commandline application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the GADO output file.

AlphScore¶

AlphScore is a method to predict the pathogenicity of missense variants using features derived from AlphaFold2.

We add the available precomputed scores of AlphScore using a custom VEP plugin.

ncER¶

The non-coding essential regulation (ncER) score indicates if a region is likely to be essential in terms of regulation. The ncER file VIP uses is the version provided by GREEN-VARAN (https://github.com/edg1983/GREEN-VARAN) on Zenodo: https://zenodo.org/records/5636163 If overlapping regions are encountered (which can occur in liftovered resources) the highest score is annotated.

ReMM¶

The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations (SNVs and small InDels) in the human genome (hg19) in terms of Mendelian diseases. The VEP plugin is build on top of the GREEN-DB dataset (GRCh38) for ReMM scores: https://zenodo.org/records/3955933 If overlapping regions are encountered (which can occur in liftovered resources) the highest score is annotated.

FATHMM-MKL¶

FATHMM-MKL predicts the Functional Consequences of Coding and Non-Coding Single Nucleotide Variants (SNVs) This plugin annotates non-coding scores only, and is build on top of the GREEN-DB dataset (GRCh38) for FATHMM-MKL non coding scores: https://zenodo.org/records/3981121

GREEN-DB constraint scores¶

GREEN-DB is a comprehensive collection of 2.4 million regulatory elements in the human genome collected from previously published databases, high-throughput screenings and functional studies. This plugin annotates the constrain scores only, and is build on top of the GREEN-DB bed files ( GRCh38): https://zenodo.org/records/5636209 GREEN-DB constrains scores are annotated per region type: enhancers, promotors, bivalent, insulators, silencers. If multiple regions of the same type overlap, VIP annotates the highest constraint score.

APOGEE 2¶

APOGEE-2 predicts pathogenicity scores for missense mitochondrial variants. A custom APOGEE VEP plugin annotates mitochondrial variants with computed APOGEE scores.

MitoTIP¶

MitoTIP predicts pathogenicity scores for mtDNA tRNA variants. A custom MitoTIP VEP plugin annotates mitochondrial variants with computed mitoTip scores and their corresponding quartiles (Q1-Q4).

HmtVar¶

HmtVar contains disease scores for mtDNA SNVs. A custom HmtVar VEP plugin annotates mitochondrial tRNA variants with computed disease scores.

BRAIN-MAGNET¶

BRAIN-MAGNET predictions for all possible SNPs from NSC NCREs (~100 million), you can easily score your interested variants from our pre-scored data. Please note that this plugin is not enabled by default. It can be added through the "params.vcf.vep_additional_plugins" configuration option. Resources for this plugin should be provided by the user and are not part of the VIP install script. The resources can be found here: resources