2021
DOI: 10.1038/s41598-020-80786-0
|View full text |Cite
|
Sign up to set email alerts
|

Embeddings from deep learning transfer GO annotations beyond homology

Abstract: Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
108
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 126 publications
(109 citation statements)
references
References 46 publications
1
108
0
Order By: Relevance
“…Embeddings have been used successfully as exclusive input to predicting 6 secondary structure and subcellular localization at performance levels almost reaching (Alley et al 72019; Heinzinger et al 2019;Rives et al 2021) or even exceeding (Elnaggar et al 2021;Stärk et al 8 2021) state-of-the-art methods using evolutionary information from MSAs as input. Embeddings can 9 even substitute sequence similarity for homology-based annotation transfer (Littmann et al 2021a;Littmann et al 2021b). The power of such embeddings has been increasing with the advance of algorithms (Elnaggar et al 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Embeddings have been used successfully as exclusive input to predicting 6 secondary structure and subcellular localization at performance levels almost reaching (Alley et al 72019; Heinzinger et al 2019;Rives et al 2021) or even exceeding (Elnaggar et al 2021;Stärk et al 8 2021) state-of-the-art methods using evolutionary information from MSAs as input. Embeddings can 9 even substitute sequence similarity for homology-based annotation transfer (Littmann et al 2021a;Littmann et al 2021b). The power of such embeddings has been increasing with the advance of algorithms (Elnaggar et al 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Previously, machine-learning methods in computational biology leveraged data-driven protein representations such as substitution matrices, capturing biophysical features (Henikoff & Henikoff, 1992), family-specific profiles (Stormo et al, 1982), or evolutionary couplings (Morcos et al, 2011) that capture evolutionary features. Now, embeddings provide competitive results for many prediction tasks (Littmann et al, 2021;Rao et al, 2019Rao et al, , 2020. Protein LMs may even be combined with other representations to gain even better performance (Rives et al, 2019;Villegas-Morcillo et al, 2020).…”
Section: Commentary Background Informationmentioning
confidence: 99%
“…Via the "extract" stage, the pipeline incorporates supervised and unsupervised approaches for protein embeddings to further enhance analytical potential out-ofthe-box. For instance, users can extract secondary structure in 3-and 8-states for embeddings from SeqVec (Heinzinger et al, 2019) and ProtBert (Elnaggar et al, 2020), or transfer GO annotations using embeddings of any available LM (Littmann et al, 2021). Pipeline runs are reproducible, as configurations are defined through files, and the output is stored in easily exchangeable formats, e.g., CSVs, FASTA, and HDF5 (The HDF Group, 2000).…”
Section: Commentary Background Informationmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, the method enables candidate selections for follow-up in vivo and in planta studies that will eventually reveal the biological roles of these functional centers such as those reported in Joudoi et al (2013), Shen et al (2019), Vaz Dias et al (2019, Angkawijaya et al (2020), Lee et al (2020), andTurek et al (2020). We foresee that emerging experimental data will inform and strengthen motif refinement efforts, and enable the development of modern machine learning techniques that incorporate multiple features ranging from the classical physicochemical properties of protein domains and protein-protein interaction (PPI) networks to GO based function predictions, to not only automate annotations for uncharacterized proteins, but also to identify hidden functional centers in complex multi-functional proteins (Rifaioglu et al, 2019;Bonetta and Valentino, 2020;Cai et al, 2020;Littmann et al, 2021).…”
Section: Conclusion and Future Perspectivementioning
confidence: 99%