2022
DOI: 10.1101/2022.12.05.519073
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations

Abstract: As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations (e.g., Evolutionary Scale Modelling (ESM)-1b embedding) from protein sequences based on self-supe… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(17 citation statements)
references
References 31 publications
0
17
0
Order By: Relevance
“…Therefore, we do not include methods that are primarily based on sequence similarity, such as BLAST, Diamond, or their combinations, as baselines. For the time-based dataset evaluation, we selected three state-of-the-art methods developed by other groups: [17], SPROF [18] and NetGO3 [19].…”
Section: Methodsmentioning
confidence: 99%
“…Therefore, we do not include methods that are primarily based on sequence similarity, such as BLAST, Diamond, or their combinations, as baselines. For the time-based dataset evaluation, we selected three state-of-the-art methods developed by other groups: [17], SPROF [18] and NetGO3 [19].…”
Section: Methodsmentioning
confidence: 99%
“…As baselines, we trained two baseline methods DeepGO-CNN (Kulmanov and Hoehndorf, 2019) and DeepGOZero (Kulmanov and Hoehndorf, 2022) and generate predictions without using any sequence similarity component such as BLAST (Altschul et al ., 1997) or Diamond (Buchfink et al ., 2014). For time-based dataset evaluation we selected three state-of-the-art methods such as TALE (Cao and Shen, 2021a), SPROF (Yuan et al ., 2023b) and NetGO3 (Wang et al ., 2023). We used baseline methods available models to generate predictions.…”
Section: Methodsmentioning
confidence: 99%
“…On the other hand, methods that learn a classifier with PU data directly (Song et al ., 2021) rely on optimization frameworks such as Majorization Minimization (Kenneth Lange and Yang, 2000) or Support Vector Machines (Cortes and Vapnik, 1995). However, in recent years, protein function prediction has been extensively addressed with emerging deep learning techniques(Kulmanov et al ., 2017; Cao and Shen, 2021a; Yuan et al, 2023a; Wang et al, 2023).…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, transformer models have revolutionized the field by enabling the development of large Protein Language Models (LPLMs), which have emerged as transformative tools in computational biology and bioinformatics [23,24]. Similar to large Protein Language Models (LLMs) of NLP trained on large corpora of words [25], similar efforts have been applied to train LPLMs using large protein databases such as BFD100, UniRef50, and UniRef100 with trillions of protein sequences.…”
Section: Related Workmentioning
confidence: 99%