Predicting Pathogenicity of Missense Variants with Weakly Supervised Regression

Cao, Yue; Sun, Yuanfei; Karimi, Mostafa; Chen, Haoran; Moronfoye, Oluwaseyi; Shen, Yang

doi:10.1101/545913

Cited by 3 publications

(4 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Disease Index Matrix (P d ) is a scale that associates each variant type (i.e., pair of wild type and variant residues) with the probability of being related to the disease. The scale has been estimated with a statistical analysis of a large data set of disease‐related and neutral variations retrieved from UniProtKB and dbSNP databases. AIBI directly predicted the probability of pathogenicity with weakly supervised linear regression, as detailed in the CAGI5 special issue (Cao et al, ) as the exact probabilities are not available for supervised machine learning. They used variants annotated with the class of pathogenicity in ClinVar, selected from MutPred2 15 features about molecular impacts upon variation, and designed parabola‐shaped loss functions that penalize the predicted probability of pathogenicity according to its supposed class. Color Genomics submitted four sets of predictions with LEAP (Lai et al, ), a machine learning framework that predicts variant pathogenicity according to features including: population frequencies from gnomAD; function prediction from SnpEFF (Cingolani et al, ), SIFT(Ng & Henikoff, ), PolyPhen‐2 (Adzhubei, Jordan, & Sunyaev, ) and MutationTaster2 (Schwarz, Cooper, Schuelke, & Seelow, ); splice impact estimation from Alamut (Interactive Biosoftware, Rouen, France) and Skippy (Woolfe, Mullikin, & Elnitski, ); indications of publications mentioning the variant and cancer associations from the subscription version of HGMD, indicating whether or not the variant is included in HGMD, whether or not it is associated with one or more articles curated by HGMD, and whether HGMD associates the variant with cancer (Stenson et al, ); and aggregate information from individuals who have undergone genetic testing.…”

Section: Methodsmentioning

confidence: 99%

“…AIBI directly predicted the probability of pathogenicity with weakly supervised linear regression, as detailed in the CAGI5 special issue (Cao et al, ) as the exact probabilities are not available for supervised machine learning. They used variants annotated with the class of pathogenicity in ClinVar, selected from MutPred2 15 features about molecular impacts upon variation, and designed parabola‐shaped loss functions that penalize the predicted probability of pathogenicity according to its supposed class.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants

Cline

Babbi

Bonache³

et al. 2019

Human Mutation

Self Cite

View full text Add to dashboard Cite

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly‐interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants

Cline

Babbi

Bonache³

et al. 2019

Human Mutation

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition to the other cancer‐related challenges outlined above, there are two that required prediction of the pathogenicity of germline variants in cancer‐related proteins: one for breast cancer risk from variants in BRCA1 and BRCA2 as characterized by the ENIGMA consortium (Cao et al, ; Cline et al, ; Padilla et al, ; Parsons et al, ), and the other for cancer risk of variants in CHEK2 in Latina breast cancer cases and ancestry matched controls (Voskanian et al, ).…”

Section: Introductionmentioning

confidence: 99%

Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation

et al. 2019

View full text Add to dashboard Cite

Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'kā‐jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.

show abstract

“…In total, 2,026 variations of six tumor suppressors (CHEK2, BRCA1, BRCA2, BRIP1, RBBP8, and TP53) were collected. Using MutPred2, 15 features were extracted; together with a constant as the16th feature, used in linear regression with a tailored loss function (Cao et al, ) Specifically, to describe a penalty more in line with the real biological processes while reducing the complexity of the optimization, the loss function needs to be convex and first‐order differentiable. To accommodate these two conditions, a parabola‐shaped polynomial of degree six as the loss function was implemented.…”

Section: Methodsmentioning

confidence: 99%

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer

et al. 2019

Self Cite

View full text Add to dashboard Cite

The availability of disease‐specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI‐5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV‐disease relationships.

show abstract

Predicting Pathogenicity of Missense Variants with Weakly Supervised Regression

Cited by 3 publications

References 53 publications

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants

Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer

Contact Info

Product

Resources

About