Zeyuan Wang scite author profile

Zeyuan Wang

2Publications

6Citation Statements Received

91Citation Statements Given

How they've been cited

How they cite others

Affiliations

Zhejiang University of Science and Technology

Publications

Order By: Most citations

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Lin

Wells

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. By exploiting one of the best-performing protein language models (ESM-1b), we established a robust classifier, VariPred, requiring no pre-calculation of structural features or multiple sequence alignments. We compared the performance of VariPred with other representative models including 3Cnet, Polyphen-2, FATHMM and ‘ESM variant’. VariPred outperformed all these methods on the ClinVar dataset achieving an MCC of 0.727 vs. an MCC of 0.687 for the next closest predictor.

show abstract

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Lin

Wells

Wang

et al. 2023

Preprint

View full text Add to dashboard Cite

Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. By exploiting one of the best performing protein language models (ESM-1b), we established a robust classifier, VariPred, requiring no pre-calculation of structural features or multiple sequence alignments. We compared the performance of VariPred with other representative models including 3Cnet, EVE and ESM variant. VariPred outperformed all these methods on the ClinVar dataset achieving an MCC of 0.751 vs. an MCC of 0.690 for the next closest predictor.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zeyuan Wang

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Contact Info

Product

Resources

About