2023
DOI: 10.21203/rs.3.rs-3188248/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Abstract: Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…Furthermore, AI can be used to develop in silico methods to predict and simulate biological and chemical spaces. Examples of such approaches are cellular and genetic perturbation modelling (Bunne et al 2023;Prasad et al 2022), gene expression prediction (Avsec et al 2021;Kelley et al 2018;Linder et al 2023), variant effect prediction (Brandes et al 2022;Cheng et al 2023;Frazer et al 2021;Lin et al 2023a), protein structure prediction (Baek et al 2021;Jumper et al 2021;Lin et al 2023b), drug-target interaction prediction Huang et al 2021;Wen et al 2017), and molecular docking simulations for drug design (Corso et al 2023;Gentile et al 2020). When it comes to determining the applicability of AI, we can refer to some guiding principles (Figure 1) that can help us to establish whether introducing AI to solve our problem is sensible.…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…Furthermore, AI can be used to develop in silico methods to predict and simulate biological and chemical spaces. Examples of such approaches are cellular and genetic perturbation modelling (Bunne et al 2023;Prasad et al 2022), gene expression prediction (Avsec et al 2021;Kelley et al 2018;Linder et al 2023), variant effect prediction (Brandes et al 2022;Cheng et al 2023;Frazer et al 2021;Lin et al 2023a), protein structure prediction (Baek et al 2021;Jumper et al 2021;Lin et al 2023b), drug-target interaction prediction Huang et al 2021;Wen et al 2017), and molecular docking simulations for drug design (Corso et al 2023;Gentile et al 2020). When it comes to determining the applicability of AI, we can refer to some guiding principles (Figure 1) that can help us to establish whether introducing AI to solve our problem is sensible.…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…OpenFold (Ahdritz et al., 2022) and RoseTTAFold (Baek et al., 2021) have similar architecture and performance to AlphaFold and rely on deep MSAs. ESMFold (Z. Lin et al., 2023) and OmegaFold (Wu et al., 2022) are large language model (LLM)–based algorithms that do not use MSAs. Consequently, they have a faster execution than AlphaFold (ESMFold has precalculated structures for 600 million sequences!)…”
Section: Commentarymentioning
confidence: 99%
“…Among the most widely used in silico prediction tools are SIFT (Ng & Henikoff, 2001), PolyPhen‐2 (Adzhubei et al., 2010, 2013), and CADD (Rentzsch et al., 2018). More recent methods utilize advanced deep‐learning techniques (Frazer et al., 2021; Qi et al., 2021), including large language models (Brandes et al., 2022; Lin et al., 2023), to predict the pathogenicity of missense variants with greater accuracy. However, although predicted pathogenicity scores may aid in identifying a driver mutation, they do not elucidate how a variant impacts protein function.…”
Section: Introductionmentioning
confidence: 99%
“…Consequently, the model is able to produce a concise representation of the full protein sequence, without relying on three-dimensional information. This rich and meaningful representations of ESM-2 has aided numerous studies, including protein functional prediction [57], [58], protein structure prediction [55], protein-protein interaction prediction [59], protein multimodal representation [60], and protein design [61], [62].…”
Section: Introductionmentioning
confidence: 99%