“…The Disease Index Matrix (P d ) is a scale that associates each variant type (i.e., pair of wild type and variant residues) with the probability of being related to the disease. The scale has been estimated with a statistical analysis of a large data set of disease‐related and neutral variations retrieved from UniProtKB and dbSNP databases. - AIBI directly predicted the probability of pathogenicity with weakly supervised linear regression, as detailed in the CAGI5 special issue (Cao et al, ) as the exact probabilities are not available for supervised machine learning. They used variants annotated with the class of pathogenicity in ClinVar, selected from MutPred2 15 features about molecular impacts upon variation, and designed parabola‐shaped loss functions that penalize the predicted probability of pathogenicity according to its supposed class.
- Color Genomics submitted four sets of predictions with LEAP (Lai et al, ), a machine learning framework that predicts variant pathogenicity according to features including:
- population frequencies from gnomAD;
- function prediction from SnpEFF (Cingolani et al, ), SIFT(Ng & Henikoff, ), PolyPhen‐2 (Adzhubei, Jordan, & Sunyaev, ) and MutationTaster2 (Schwarz, Cooper, Schuelke, & Seelow, );
- splice impact estimation from Alamut (Interactive Biosoftware, Rouen, France) and Skippy (Woolfe, Mullikin, & Elnitski, );
- indications of publications mentioning the variant and cancer associations from the subscription version of HGMD, indicating whether or not the variant is included in HGMD, whether or not it is associated with one or more articles curated by HGMD, and whether HGMD associates the variant with cancer (Stenson et al, ); and
- aggregate information from individuals who have undergone genetic testing.
…”