2021
DOI: 10.1038/s41586-021-04043-8
|View full text |Cite|
|
Sign up to set email alerts
|

Disease variant prediction with deep generative models of evolutionary data

Abstract: Quantifying the pathogenicity of protein variants in human disease-related genes would have a profound impact on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences [1][2][3] . In principle, computational methods could support the largescale interpretation of genetic variants. However, prior methods 4-7 have relied on training machine learning models on available clinical labels. Since these labels are sparse, biased, and of variable quality, the resul… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
597
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 500 publications
(602 citation statements)
references
References 51 publications
4
597
1
Order By: Relevance
“…TP53_PROF’s classification was compared with six scores that provided formal classification cutoff in the TP53_UMD database (Polyphen2 HumVar, Polyphen2 HumDiv, Sift, Condel, Provean and Mutassessor). EVE, a recently released deep generative model of evolutionary data [ 45 ], CHASM, a cancer-specific algorithm that showed peak performances in recent analysis [ 14 , 15 ] and Revel, another in-silico tool that presented with best balanced accuracy in a recently published algorithms comparison [ 13 , 59 ], were also used for this analysis with different defined cutoffs for LOF provided by the scores documentation (see Methods). Annotations based on the three experimental validations done on the set of 41 variants (presented in Supplementary Table S7A available online at http://bib.oxfordjournals.org/ ) were used as the truth set.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…TP53_PROF’s classification was compared with six scores that provided formal classification cutoff in the TP53_UMD database (Polyphen2 HumVar, Polyphen2 HumDiv, Sift, Condel, Provean and Mutassessor). EVE, a recently released deep generative model of evolutionary data [ 45 ], CHASM, a cancer-specific algorithm that showed peak performances in recent analysis [ 14 , 15 ] and Revel, another in-silico tool that presented with best balanced accuracy in a recently published algorithms comparison [ 13 , 59 ], were also used for this analysis with different defined cutoffs for LOF provided by the scores documentation (see Methods). Annotations based on the three experimental validations done on the set of 41 variants (presented in Supplementary Table S7A available online at http://bib.oxfordjournals.org/ ) were used as the truth set.…”
Section: Resultsmentioning
confidence: 99%
“…For EVE, annotations were used for EVE90pct and EVE75pct, which account for the percentage of certainty. The EVE90pct sets 10% of the variants as ‘uncertain,’ and EVE75pct sets 25% of the variants as ‘uncertain’ [ 45 ].…”
Section: Methodsmentioning
confidence: 99%
“…) is a recently developed unsupervised computational method, which trained Bayesian variational autoencoders on multiple sequence alignments to classify variant effects based on a variant-specific, computed evolutionary index followed by a fitted global-local mixture of Gaussian Mixture Models (17).…”
Section: Eve (Evolutionary Models Of Variant Effectsmentioning
confidence: 99%
“…Here, we created an MDR3-specific variant dataset and trained a machine learning algorithm using several established general prediction tools, namely EVE, EVmutation, PolyPhen-2, I-Mutant2.0, MUpro, MAESTRO, and PON-P2 (17)(18)(19)(20)(21)(22)(23), as well as half-sphere exposure, posttranslational modification (PTM) site influence, and secondary structure disruption as features to obtain an MDR3-specific prediction tool for help in classifying variants as benign or pathogenic (see Fig. 1 for a graphical overview).…”
Section: Introductionmentioning
confidence: 99%
“…Just as evolutionarily conserved individual residues are generally crucial to a protein’s proper function, the statistical covariation (arising from correlated evolution, i.e. coevolution) between pairs of residues 1 , 2 carries information that is useful for predicting structural contacts 3 7 and protein–protein interactions 8 11 and their interfaces 12 , intuiting novel protein conformations 5 , understanding protein allostery 13 , interpreting variants 14 , 15 , identifying functional domains 16 19 , and reprogramming protein specificity 20 . However, despite the increasing prevalence of sequencing data, sampling of the phylogenetic tree is fundamentally limited and biased.…”
Section: Introductionmentioning
confidence: 99%