Annotations¶
VIP annotates variant effects and genotype data for samples using a rich set of tools. Annotions can be used to classify variants using classification trees and displayed in reports.
Overview¶
The table contains annotations available in most output files. Depending on the workflow and the configuration used additional annotations might be available, check the output file headers for the complete overview. Similarly, some annotations listed below might be missing from your output file depending on the sample sheet content and configuration.
annotation | type | source | description |
---|---|---|---|
FORMAT/VI | string list | vip-inheritance-matcher | An enumeration of possible inheritance modes (Possible values: AR, AR_C, AD, AD_IP, XLR, XLD) |
FORMAT/VIC | string | vip-inheritance-matcher | Possible Compound hetrozygote variants |
FORMAT/VID | integer | vip-inheritance-matcher | De novo variant |
FORMAT/VIG | string list | vip-inheritance-matcher | Genes with an inheritance match |
FORMAT/VIM | integer | vip-inheritance-matcher | Inheritance Match: Genotypes, affected statuses and known gene inheritance patterns match |
FORMAT/VIPC_S | string list | vip-decision-tree | VIP decision tree classification for sample |
FORMAT/VIPP_S | string list | vip-decision-tree | VIP decision tree path for sample |
INFO/CSQ/Allele | string | VEP | The variant allele used to calculate the consequence |
INFO/CSQ/ALLELE_NUM | integer | VEP | Allele nr within the VCF file. |
INFO/CSQ/ALPHSCORE | float | VEP plugin | AlphScore pathogenicity score for missense variants (see here) |
INFO/CSQ/Amino_acids | string | VEP | Reference and variant amino acids |
INFO/CSQ/ASV_ACMG_class | string | VEP plugin | AnnotSv 'ACMG_class' output |
INFO/CSQ/ASV_AnnotSV_ranking_criteria | string | VEP plugin | AnnotSv 'AnnotSV_ranking_criteria' output |
INFO/CSQ/ASV_AnnotSV_ranking_score | string | VEP plugin | AnnotSv 'AnnotSV_ranking_score' output |
INFO/CSQ/BIOTYPE | string | VEP | Biotype of transcript or regulatory feature |
INFO/CSQ/CAPICE_CL | categorical | VEP plugin | CAPICE classification (see here). Categories: B, LB, VUS, LP, P |
INFO/CSQ/CAPICE_SC | float | VEP plugin | CAPICE score |
INFO/CSQ/cDNA_position | string | VEP | Position within the cDNA |
INFO/CSQ/CDS_position | string | VEP | Position within the coding sequence |
INFO/CSQ/CHECK_REF | string | VEP | Reports variants where the input reference does not match the expected reference |
INFO/CSQ/CLIN_SIG | string list | VEP | ClinVar classification(s) (do not use, see here) |
INFO/CSQ/clinVar_CLNID | integer list | VEP plugin | ClinVar variation identifier |
INFO/CSQ/clinVar_CLNREVSTAT | categorical list | VEP plugin | ClinVar review status for the Variation ID. Categories: practice_guideline, reviewed_by_expert_panel, criteria_provided, _multiple_submitters, _no_conflicts, _single_submitter, _conflicting_interpretations, no_assertion_criteria_provided, no_assertion_provided |
INFO/CSQ/clinVar_CLNSIG | string | VEP plugin | Clinical significance for this single variant; multiple values are separated by a vertical bar. Categories: Benign, Likely_benign, Uncertain_significance, Likely_pathogenic, Pathogenic, Conflicting_classifications_of_pathogenicity, Other |
INFO/CSQ/clinVar_CLNSIGINCL | string | VEP plugin | Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance; multiple values are separated by a vertical bar. Categories: Benign, Likely_benign, Uncertain_significance, Likely_pathogenic, Pathogenic, Conflicting_interpretations_of_pathogenicity |
INFO/CSQ/Codons | string | VEP | Reference and variant codon sequence |
INFO/CSQ/Consequence | string list | VEP | Effect(s) described as Sequence Ontology term(s) |
INFO/CSQ/DISTANCE | string | VEP | Shortest distance from variant to transcript |
INFO/CSQ/existing_InFrame_oORFs | string | VEP plugin | The number of existing inFrame overlapping ORFs (inFrame oORF) at the 5 prime UTR |
INFO/CSQ/existing_OutOfFrame_oORFs | string | VEP plugin | The number of existing out-of-frame overlapping ORFs (OutOfFrame oORF) at the 5 prime UTR |
INFO/CSQ/existing_uORFs | string | VEP plugin | The number of existing uORFs with a stop codon within the 5 prime UTR |
INFO/CSQ/Existing_variation | string list | VEP | Identifier(s) of co-located known variants |
INFO/CSQ/EXON | string | VEP | The exon number (out of total number) |
INFO/CSQ/Feature | string | VEP | Ensembl stable ID of feature |
INFO/CSQ/Feature_type | categorical | VEP | VEP feature type. Categories: Transcript, RegulatoryFeature, MotifFeature |
INFO/CSQ/FATHMM_MKL_NC | float | VEP plugin | The FATHMM-MKL score for Non-Coding Single Nucleotide Variants (SNVs) |
INFO/CSQ/five_prime_UTR_variant_annotation | string | VEP plugin | Output the annotation of a given 5 prime UTR variant |
INFO/CSQ/five_prime_UTR_variant_consequence | string | VEP plugin | Output the variant consequences of a given 5 prime UTR variant: uAUG_gained, uAUG_lost, uSTOP_lost or uFrameshift |
INFO/CSQ/FLAGS | string list | VEP | Transcript quality flags (cds_start_NF: CDS 5' incomplete, cds_end_NF: CDS 3' incomplete) |
INFO/CSQ/GADO_PD | categorical | VEP plugin | GADO prediction for the relation between the HPO terms of the proband(s) and the gene, HC: high confidence, LC: low confidence. Categories: LC, HC |
INFO/CSQ/GADO_SC | float | VEP plugin | The combined prioritization GADO Z-score over the HPO of the proband(s) terms for this case |
INFO/CSQ/Gene | string | VEP | Ensembl stable ID of affected gene |
INFO/CSQ/gnomAD_COV | float | VEP plugin | gnomAD coverage (percent of individuals in gnomAD source) |
INFO/CSQ/gnomAD_AF | float | VEP plugin | gnomAD allele frequency |
INFO/CSQ/gnomAD_FAF95 | float | VEP plugin | gnomAD filter allele frequency (95% confidence) |
INFO/CSQ/gnomAD_FAF99 | float | VEP plugin | gnomAD filter allele frequency (99% confidence) |
INFO/CSQ/gnomAD_HN | integer | VEP plugin | gnomAD number of homozygotes |
INFO/CSQ/gnomAD_QC | string list | VEP plugin | gnomAD quality control filters that failed |
INFO/CSQ/gnomAD_SRC | categorical | VEP plugin | gnomAD source (E=exomes, G=genomes, T=total) |
INFO/CSQ/Grantham | string | VEP plugin | Grantham Matrix score - Grantham, R. Amino Acid Difference Formula to Help Explain Protein Evolution, Science 1974 Sep 6;185(4154):862-4 |
INFO/CSQ/HGNC_ID | integer | VEP | HGNC gene identifier |
INFO/CSQ/HGVS_OFFSET | string | VEP | Indicates by how many bases the HGVS notations for this variant have been shifted |
INFO/CSQ/HGVSc | string | VEP | HGVS nomenclature: coding DNA reference sequence |
INFO/CSQ/HGVSp | string | VEP | HGVS nomenclature: protein reference sequence |
INFO/CSQ/HIGH_INF_POS | string | VEP | A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) |
INFO/CSQ/HPO | string list | VEP plugin | Human phenotype ontology term that match |
INFO/CSQ/IMPACT | categorical | VEP | Impact as predicted by VEP. Categories: LOW, MODERATE, HIGH, MODIFIER |
INFO/CSQ/IncompletePenetrance | string | VEP plugin | Boolean indicating if the gene is known for incomplete penetrance (1:true) |
INFO/CSQ/InheritanceModesGene | string list | VEP plugin | List of inheritance modes for the gene |
INFO/CSQ/INTRON | string | VEP | The intron number (out of total number) |
INFO/CSQ/MOTIF_NAME | string | VEP | The source and identifier of a transcription factor binding profile aligned at this position |
INFO/CSQ/MOTIF_POS | string | VEP | The relative position of the variation in the aligned TFBP |
INFO/CSQ/MOTIF_SCORE_CHANGE | string | VEP | The difference in motif score of the reference and variant sequences for the TFBP |
INFO/CSQ/ncER | float | VEP plugin | The non-coding essential regulation (ncER) score indicates if a region is likely to be essential in terms of regulation. |
INFO/CSQ/PHENO | integer list | VEP | Indicates if existing variant is associated with a phenotype, disease or trait; multiple values correspond to multiple values in the Existing_variation field |
INFO/CSQ/phyloP | string | VEP custom | Conservation p-values, see here |
INFO/CSQ/PICK | integer | VEP | Boolean indicating if this is the VEP picked transcript |
INFO/CSQ/PolyPhen | float | VEP | PolyPhen score |
INFO/CSQ/Protein_position | string | VEP | Position within the protein |
INFO/CSQ/PUBMED | integer list | VEP | PubMed citations |
INFO/CSQ/REFSEQ_MATCH | string | VEP | Flag indicating whether and how the RefSeq model differs from the underlying genome |
INFO/CSQ/REFSEQ_OFFSET | string | VEP | ? |
INFO/CSQ/ReMM | float | VEP plugin | The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations in the human genome in terms of Mendelian diseases. |
INFO/CSQ/SIFT | float | VEP | SIFT score |
INFO/CSQ/SOMATIC | integer list | VEP | Somatic status of existing variant(s); multiple values correspond to multiple values in the Existing_variation field |
INFO/CSQ/SOURCE | string | VEP | ? |
INFO/CSQ/SpliceAI_pred_DP_AG | float | VEP plugin | SpliceAI predicted effect on splicing. Delta position for acceptor gain |
INFO/CSQ/SpliceAI_pred_DP_AL | float | VEP plugin | SpliceAI predicted effect on splicing. Delta position for acceptor loss |
INFO/CSQ/SpliceAI_pred_DP_DG | float | VEP plugin | SpliceAI predicted effect on splicing. Delta position for donor gain |
INFO/CSQ/SpliceAI_pred_DP_DL | float | VEP plugin | SpliceAI predicted effect on splicing. Delta position for donor loss |
INFO/CSQ/SpliceAI_pred_DS_AG | float | VEP plugin | SpliceAI predicted effect on splicing. Delta score for acceptor gain |
INFO/CSQ/SpliceAI_pred_DS_AL | float | VEP plugin | SpliceAI predicted effect on splicing. Delta score for acceptor loss |
INFO/CSQ/SpliceAI_pred_DS_DG | float | VEP plugin | SpliceAI predicted effect on splicing. Delta score for donor gain |
INFO/CSQ/SpliceAI_pred_DS_DL | float | VEP plugin | SpliceAI predicted effect on splicing. Delta score for donor loss |
INFO/CSQ/SpliceAI_pred_SYMBOL | string | VEP plugin | SpliceAI gene symbol |
INFO/CSQ/STRAND | string | VEP | The DNA strand (1 or -1) on which the transcript/feature lies |
INFO/CSQ/SYMBOL | string | VEP | Gene symbol |
INFO/CSQ/SYMBOL_SOURCE | string | VEP | The source of the gene symbol |
INFO/CSQ/TRANSCRIPTION_FACTORS | string | VEP | ? |
INFO/CSQ/VIPC | string | vip-decision-tree | VIP decision tree classification for variant effect |
INFO/CSQ/VIPP | string list | vip-decision-tree | VIP decision tree path for variant effect |
INFO/CSQ/VKGL | string | VEP plugin | ? |
INFO/CSQ/VKGL_CL | string | VEP plugin | VKGL consensus variant classification |
Details¶
VIP uses the Ensemble Effect Predictor to annotate all variants with their
consequences. We use VEP with the refseq
option for the transcripts, and with the flags for sift
and polyphen
annotations enabled.
Plugins¶
Below we describe the other sources which we annotate using the VEP plugin framework.
CAPICE¶
CAPICE is a computational method for predicting the pathogenicity of SNVs and InDels. It is a gradient boosting tree model trained using a variety of genomic annotations used by CADD score and trained on the clinical significance. CAPICE performs consistently across diverse independent synthetic, and real clinical data sets. It ourperforms the current best method in pathogenicity estimation for variants of different molecular consequences and allele frequency.
We run the CAPICE application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the CAPICE output file.
VKGL¶
The datashare workgroup of VKGL has set up a central database to enable mutual sharing of variant classifications through a partly automatic process. An additional goal is the public sharing of these data. The currently publicly available part of the database consists of DNA variant classifications established based on (former) diagnostic questions.
We add the classifications from an export of the database and use a VEP plugin to annotate the VEP output with the classifications from the this file.
SpliceAI¶
SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations.
We add the scores from the available precomputed scores of SpliceAI and use a copy of the available VEP plugin to annotate the VEP output with the classifications from the this file.
AnnotSV¶
AnnotSV is a program for annotating and ranking structural variations from genomes of several organisms.
We run the AnnotSV application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the AnnotSV output file.
HPO¶
A file based on the HPO phenotype_to_genes.txt is used to annotate VEP consequences with the inheritance modes associated with the gene of this consequence.
Inheritance¶
A file based on the CGD database is used to annotate VEP consequences with the inheritance modes associated with the gene of this consequence.
Grantham¶
The Grantham score attempts to predict the distance between two amino acids, in an evolutionary sense. A lower Grantham score reflects less evolutionary distance. A higher Grantham score reflects a greater evolutionary distance.
We use a copy of the VEP plugin by Duarte Molha to annotate the VEP output with Grantham scores.
GADO¶
GADO can be used to prioritize genes based on the HPO terms of a patient..
We run the GADO commandline application in the VIP pipeline and use a VEP plugin to annotate the VEP output with the scores from the GADO output file.
AlphScore¶
AlphScore is a method to predict the pathogenicity of missense variants using features derived from AlphaFold2.
We add the available precomputed scores of AlphScore using a custom VEP plugin.
ncER¶
The non-coding essential regulation (ncER) score indicates if a region is likely to be essential in terms of regulation. The ncER file VIP uses is the version provided by GREEN-VARAN (https://github.com/edg1983/GREEN-VARAN) on Zenodo: https://zenodo.org/records/5636163 If overlapping regions are encountered (which can occur in liftovered resources) the highest score is annotated.
ReMM¶
The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations (SNVs and small InDels) in the human genome (hg19) in terms of Mendelian diseases. The VEP plugin is build on top of the GREEN-DB dataset (GRCh38) for ReMM scores: https://zenodo.org/records/3955933 If overlapping regions are encountered (which can occur in liftovered resources) the highest score is annotated.
FATHMM-MKL¶
FATHMM-MKL predicts the Functional Consequences of Coding and Non-Coding Single Nucleotide Variants (SNVs) This plugin annotates non-coding scores only, and is build on top of the GREEN-DB dataset (GRCh38) for FATHMM-MKL non coding scores: https://zenodo.org/records/3981121
GREEN-DB constraint scores¶
GREEN-DB GREEN-DB is a comprehensive collection of 2.4 million regulatory elements in the human genome collected from previously published databases, high-throughput screenings and functional studies. This plugin annotates the constrain scores only, and is build on top of the GREEN-DB bed files ( GRCh38): https://zenodo.org/records/5636209 GREEN-DB constrains scores are annotated per region type: enhancers, promotors, bivalent, insulators, silencers. If multiple regions of the same type overlap, VIP annotates the highest constraint score.