Annotations

VIP annotates variant effects and genotype data for samples using a rich set of tools. Annotions can be used to classify variants using classification trees and displayed in reports.

Overview

The table contains annotations available in most output files. Depending on the workflow and the configuration used additional annotations might be available, check the output file headers for the complete overview. Similarly, some annotations listed below might be missing from your output file depending on the sample sheet content and configuration.

annotation type source description
FORMAT/VI string list vip-inheritance-matcher An enumeration of possible inheritance modes (Possible values: AR, AR_C, AD, AD_IP, XLR, XLD)
FORMAT/VIC string vip-inheritance-matcher Possible Compound hetrozygote variants
FORMAT/VID integer vip-inheritance-matcher De novo variant
FORMAT/VIG string list vip-inheritance-matcher Genes with an inheritance match
FORMAT/VIM integer vip-inheritance-matcher Inheritance Match: Genotypes, affected statuses and known gene inheritance patterns match
FORMAT/VIPC_S string list vip-decision-tree VIP decision tree classification for sample
FORMAT/VIPP_S string list vip-decision-tree VIP decision tree path for sample
INFO/CSQ/Allele string VEP The variant allele used to calculate the consequence
INFO/CSQ/ALLELE_NUM integer VEP Allele nr within the VCF file.
INFO/CSQ/ALPHSCORE float VEP plugin AlphScore pathogenicity score for missense variants (see here)
INFO/CSQ/Amino_acids string VEP Reference and variant amino acids
INFO/CSQ/ASV_ACMG_class string VEP plugin AnnotSv 'ACMG_class' output
INFO/CSQ/ASV_AnnotSV_ranking_criteria string VEP plugin AnnotSv 'AnnotSV_ranking_criteria' output
INFO/CSQ/ASV_AnnotSV_ranking_score string VEP plugin AnnotSv 'AnnotSV_ranking_score' output
INFO/CSQ/BIOTYPE string VEP Biotype of transcript or regulatory feature
INFO/CSQ/CAPICE_CL categorical VEP plugin CAPICE classification (see here). Categories: B, LB, VUS, LP, P
INFO/CSQ/CAPICE_SC float VEP plugin CAPICE score
INFO/CSQ/cDNA_position string VEP Position within the cDNA
INFO/CSQ/CDS_position string VEP Position within the coding sequence
INFO/CSQ/CHECK_REF string VEP Reports variants where the input reference does not match the expected reference
INFO/CSQ/CLIN_SIG string list VEP ClinVar classification(s) (do not use, see here)
INFO/CSQ/clinVar_CLNID integer list VEP plugin ClinVar variation identifier
INFO/CSQ/clinVar_CLNREVSTAT categorical list VEP plugin ClinVar review status for the Variation ID. Categories: practice_guideline, reviewed_by_expert_panel, criteria_provided, _multiple_submitters, _no_conflicts, _single_submitter, _conflicting_interpretations, no_assertion_criteria_provided, no_assertion_provided
INFO/CSQ/clinVar_CLNSIG string VEP plugin Clinical significance for this single variant; multiple values are separated by a vertical bar. Categories: Benign, Likely_benign, Uncertain_significance, Likely_pathogenic, Pathogenic, Conflicting_interpretations_of_pathogenicity
INFO/CSQ/clinVar_CLNSIGINCL string VEP plugin Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance; multiple values are separated by a vertical bar. Categories: Benign, Likely_benign, Uncertain_significance, Likely_pathogenic, Pathogenic, Conflicting_interpretations_of_pathogenicity
INFO/CSQ/Codons string VEP Reference and variant codon sequence
INFO/CSQ/Consequence string list VEP Effect(s) described as Sequence Ontology term(s)
INFO/CSQ/DISTANCE string VEP Shortest distance from variant to transcript
INFO/CSQ/existing_InFrame_oORFs string VEP plugin The number of existing inFrame overlapping ORFs (inFrame oORF) at the 5 prime UTR
INFO/CSQ/existing_OutOfFrame_oORFs string VEP plugin The number of existing out-of-frame overlapping ORFs (OutOfFrame oORF) at the 5 prime UTR
INFO/CSQ/existing_uORFs string VEP plugin The number of existing uORFs with a stop codon within the 5 prime UTR
INFO/CSQ/Existing_variation string list VEP Identifier(s) of co-located known variants
INFO/CSQ/EXON string VEP The exon number (out of total number)
INFO/CSQ/Feature string VEP Ensembl stable ID of feature
INFO/CSQ/Feature_type categorical VEP VEP feature type. Categories: Transcript, RegulatoryFeature, MotifFeature
INFO/CSQ/FATHMM_MKL_NC float VEP plugin The FATHMM-MKL score for Non-Coding Single Nucleotide Variants (SNVs)
INFO/CSQ/five_prime_UTR_variant_annotation string VEP plugin Output the annotation of a given 5 prime UTR variant
INFO/CSQ/five_prime_UTR_variant_consequence string VEP plugin Output the variant consequences of a given 5 prime UTR variant: uAUG_gained, uAUG_lost, uSTOP_lost or uFrameshift
INFO/CSQ/FLAGS string list VEP Transcript quality flags (cds_start_NF: CDS 5' incomplete, cds_end_NF: CDS 3' incomplete)
INFO/CSQ/GADO_PD categorical VEP plugin GADO prediction for the relation between the HPO terms of the proband(s) and the gene, HC: high confidence, LC: low confidence. Categories: LC, HC
INFO/CSQ/GADO_SC float VEP plugin The combined prioritization GADO Z-score over the HPO of the proband(s) terms for this case
INFO/CSQ/Gene string VEP Ensembl stable ID of affected gene
INFO/CSQ/gnomAD_COV float VEP plugin gnomAD coverage (percent of individuals in gnomAD source)
INFO/CSQ/gnomAD_AF float VEP plugin gnomAD allele frequency
INFO/CSQ/gnomAD_FAF95 float VEP plugin gnomAD filter allele frequency (95% confidence)
INFO/CSQ/gnomAD_FAF99 float VEP plugin gnomAD filter allele frequency (99% confidence)
INFO/CSQ/gnomAD_HN integer VEP plugin gnomAD number of homozygotes
INFO/CSQ/gnomAD_QC string list VEP plugin gnomAD quality control filters that failed
INFO/CSQ/gnomAD_SRC categorical VEP plugin gnomAD source (E=exomes, G=genomes, T=total)
INFO/CSQ/Grantham string VEP plugin Grantham Matrix score - Grantham, R. Amino Acid Difference Formula to Help Explain Protein Evolution, Science 1974 Sep 6;185(4154):862-4
INFO/CSQ/HGNC_ID integer VEP HGNC gene identifier
INFO/CSQ/HGVS_OFFSET string VEP Indicates by how many bases the HGVS notations for this variant have been shifted
INFO/CSQ/HGVSc string VEP HGVS nomenclature: coding DNA reference sequence
INFO/CSQ/HGVSp string VEP HGVS nomenclature: protein reference sequence
INFO/CSQ/HIGH_INF_POS string VEP A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP)
INFO/CSQ/HPO string list VEP plugin Human phenotype ontology term that match
INFO/CSQ/IMPACT categorical VEP Impact as predicted by VEP. Categories: LOW, MODERATE, HIGH, MODIFIER
INFO/CSQ/IncompletePenetrance string VEP plugin Boolean indicating if the gene is known for incomplete penetrance (1:true)
INFO/CSQ/InheritanceModesGene string list VEP plugin List of inheritance modes for the gene
INFO/CSQ/INTRON string VEP The intron number (out of total number)
INFO/CSQ/MOTIF_NAME string VEP The source and identifier of a transcription factor binding profile aligned at this position
INFO/CSQ/MOTIF_POS string VEP The relative position of the variation in the aligned TFBP
INFO/CSQ/MOTIF_SCORE_CHANGE string VEP The difference in motif score of the reference and variant sequences for the TFBP
INFO/CSQ/ncER float VEP plugin The non-coding essential regulation (ncER) score indicates if a region is likely to be essential in terms of regulation.
INFO/CSQ/PHENO integer list VEP Indicates if existing variant is associated with a phenotype, disease or trait; multiple values correspond to multiple values in the Existing_variation field
INFO/CSQ/phyloP string VEP custom Conservation p-values, see here
INFO/CSQ/PICK integer VEP Boolean indicating if this is the VEP picked transcript
INFO/CSQ/PolyPhen float VEP PolyPhen score
INFO/CSQ/Protein_position string VEP Position within the protein
INFO/CSQ/PUBMED integer list VEP PubMed citations
INFO/CSQ/REFSEQ_MATCH string VEP Flag indicating whether and how the RefSeq model differs from the underlying genome
INFO/CSQ/REFSEQ_OFFSET string VEP ?
INFO/CSQ/ReMM float VEP plugin The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations in the human genome in terms of Mendelian diseases.
INFO/CSQ/SIFT float VEP SIFT score
INFO/CSQ/SOMATIC integer list VEP Somatic status of existing variant(s); multiple values correspond to multiple values in the Existing_variation field
INFO/CSQ/SOURCE string VEP ?
INFO/CSQ/SpliceAI_pred_DP_AG float VEP plugin SpliceAI predicted effect on splicing. Delta position for acceptor gain
INFO/CSQ/SpliceAI_pred_DP_AL float VEP plugin SpliceAI predicted effect on splicing. Delta position for acceptor loss
INFO/CSQ/SpliceAI_pred_DP_DG float VEP plugin SpliceAI predicted effect on splicing. Delta position for donor gain
INFO/CSQ/SpliceAI_pred_DP_DL float VEP plugin SpliceAI predicted effect on splicing. Delta position for donor loss
INFO/CSQ/SpliceAI_pred_DS_AG float VEP plugin SpliceAI predicted effect on splicing. Delta score for acceptor gain
INFO/CSQ/SpliceAI_pred_DS_AL float VEP plugin SpliceAI predicted effect on splicing. Delta score for acceptor loss
INFO/CSQ/SpliceAI_pred_DS_DG float VEP plugin SpliceAI predicted effect on splicing. Delta score for donor gain
INFO/CSQ/SpliceAI_pred_DS_DL float VEP plugin SpliceAI predicted effect on splicing. Delta score for donor loss
INFO/CSQ/SpliceAI_pred_SYMBOL string VEP plugin SpliceAI gene symbol
INFO/CSQ/STRAND string VEP The DNA strand (1 or -1) on which the transcript/feature lies
INFO/CSQ/SYMBOL string VEP Gene symbol
INFO/CSQ/SYMBOL_SOURCE string VEP The source of the gene symbol
INFO/CSQ/TRANSCRIPTION_FACTORS string VEP ?
INFO/CSQ/VIPC string vip-decision-tree VIP decision tree classification for variant effect
INFO/CSQ/VIPP string list vip-decision-tree VIP decision tree path for variant effect
INFO/CSQ/VKGL string VEP plugin ?
INFO/CSQ/VKGL_CL string VEP plugin VKGL consensus variant classification

Details

VIP uses the Ensemble Effect Predictor to annotate all variants with their consequences. We use VEP with the refseq option for the transcripts, and with the flags for sift and polyphen annotations enabled.

Plugins

Below we describe the other sources which we annotate using the VEP plugin framework.

CAPICE

CAPICE is a computational method for predicting the pathogenicity of SNVs and InDels. It is a gradient boosting tree model trained using a variety of genomic annotations used by CADD score and trained on the clinical significance. CAPICE performs consistently across diverse independent synthetic, and real clinical data sets. It ourperforms the current best method in pathogenicity estimation for variants of different molecular consequences and allele frequency.

We run the CAPICE application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the CAPICE output file.

VKGL

The datashare workgroup of VKGL has set up a central database to enable mutual sharing of variant classifications through a partly automatic process. An additional goal is the public sharing of these data. The currently publicly available part of the database consists of DNA variant classifications established based on (former) diagnostic questions.

We add the classifications from an export of the database and use a VEP plugin to annotate the VEP output with the classifications from the this file.

SpliceAI

SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations.

We add the scores from the available precomputed scores of SpliceAI and use a copy of the available VEP plugin to annotate the VEP output with the classifications from the this file.

AnnotSV

AnnotSV is a program for annotating and ranking structural variations from genomes of several organisms.

We run the AnnotSV application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the AnnotSV output file.

HPO

A file based on the HPO phenotype_to_genes.txt is used to annotate VEP consequences with the inheritance modes associated with the gene of this consequence.

Inheritance

A file based on the CGD database is used to annotate VEP consequences with the inheritance modes associated with the gene of this consequence.

Grantham

The Grantham score attempts to predict the distance between two amino acids, in an evolutionary sense. A lower Grantham score reflects less evolutionary distance. A higher Grantham score reflects a greater evolutionary distance.

We use a copy of the VEP plugin by Duarte Molha to annotate the VEP output with Grantham scores.

GADO

GADO can be used to prioritize genes based on the HPO terms of a patient..

We run the GADO commandline application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the GADO output file.

AlphScore

AlphScore is a method to predict the pathogenicity of missense variants using features derived from AlphaFold2.

We add the available precomputed scores of AlphScore using a custom VEP plugin.

ncER

The non-coding essential regulation (ncER) score indicates if a region is likely to be essential in terms of regulation. The ncER file VIP uses is the version provided by GREEN-VARAN (https://github.com/edg1983/GREEN-VARAN) on Zenodo: https://zenodo.org/records/5636163

ReMM

The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations (SNVs and small InDels) in the human genome (hg19) in terms of Mendelian diseases. The VEP plugin is build on top of the GREEN-DB dataset (GRCh38) for ReMM scores: https://zenodo.org/records/3955933

FATHMM-MKL

FATHMM-MKL predicts the Functional Consequences of Coding and Non-Coding Single Nucleotide Variants (SNVs) This plugin annotates non-coding scores only, and is build on top of the GREEN-DB dataset (GRCh38) for FATHMM-MKL non coding scores: https://zenodo.org/records/3981121

#### GREEN-DB constraint scores GREEN-DB GREEN-DB is a comprehensive collection of 2.4 million regulatory elements in the human genome collected from previously published databases, high-throughput screenings and functional studies. This plugin annotates the constrain scores only, and is build on top of the GREEN-DB bed files (GRCh38): https://zenodo.org/records/5636209 GREEN-DB constrains scores are annotated per region type: enhancers, promotors, bivalent, insulators, silencers. If multiple regions of the same type overlap, VIP annotates the highest constraint score.