2020
DOI: 10.1101/2020.01.31.929604
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Annotating Gene Ontology terms for protein sequences with the Transformer model

Abstract: 7Predicting functions for novel amino acid sequences is a long-standing research problem. The 8 Uniprot database which contains protein sequences annotated with Gene Ontology (GO) terms, 9 is one commonly used training dataset for this problem. Predicting protein functions can then 10 be viewed as a multi-label classification problem where the input is an amino acid sequence and 11 the output is a set of GO terms. Recently, deep convolutional neural network (CNN) models have 12 been introduced to annotate GO t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…Our contributions are as follows. First, TALE replaces previously-used convolutional neural networks (CNN) with self-attention-based transformers (Vaswani et al, 2017) which has made a major breakthrough in natural language processing and recently in protein sequence embedding (Rives et al, 2019;Duong et al, 2020;Elnaggar et al, 2020). Compared to CNN, transformers can deal with global dependencies within the sequence in just one layer, which helps detect global sequence patterns for function prediction much easier than CNN-based methods do.…”
Section: Introductionmentioning
confidence: 99%
“…Our contributions are as follows. First, TALE replaces previously-used convolutional neural networks (CNN) with self-attention-based transformers (Vaswani et al, 2017) which has made a major breakthrough in natural language processing and recently in protein sequence embedding (Rives et al, 2019;Duong et al, 2020;Elnaggar et al, 2020). Compared to CNN, transformers can deal with global dependencies within the sequence in just one layer, which helps detect global sequence patterns for function prediction much easier than CNN-based methods do.…”
Section: Introductionmentioning
confidence: 99%
“…TALE ( Cao and Shen, 2021 ) employed a transformer-based deep-learning model with a joint embedding of sequence inputs and hierarchical function labels. While it remains a great challenge regarding how to effectively capture the GO term inter-relationships, a recent study ( Duong et al , 2020 ) shows that incorporating the hierarchical structure of GO graphs can enable the annotation model to emphasize on the GO label distribution, thereby benefitting the final prediction.…”
Section: Introductionmentioning
confidence: 99%
“…There is a dire need for efficient and accurate protein function annotation tools for the community to study the growing sequence databases [2][3][4]. Many computational methods have been developed to annotate protein functions based on primary sequences [5][6][7][8][9], protein family and domain annotations [10][11][12], protein-protein interaction(PPI) networks [7,9,10], and other hand-crafted features [8,9,13]. Critical Assessment of Functional Annotation(CAFA), a community-driven benchmark effort for automated protein function annotation, has shown that integrative prediction methods that combine multiple information sources usually outperform sequence-based methods [2][3][4].…”
Section: Introductionmentioning
confidence: 99%