2019
DOI: 10.1101/765644
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating Representations for Gene Ontology Terms

Abstract: Functions of proteins are annotated by Gene Ontology (GO) terms. As the amount of new sequences being collected is rising at a faster pace than the number of sequences being annotated with GO terms, there have been efforts to develop better annotation techniques. When annotating protein sequences with GO terms, one key auxiliary resource is the GO data itself. GO terms have definitions consisted of a few sentences describing some biological events, and are also arranged in a tree structure with specific terms … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 27 publications
0
17
0
Order By: Relevance
“…To keep our software GOAT manageable for all users, we implement Transformer with only one head and 12 layers. We initialize the GO embedding E G as the pre-trained embedding BERT LAYER12 in [6] which transforms the GO definitions into vectors, where GOs having related definitions will have comparable vectors. Using pre-trained GO embeddings reduces the number of parameters in the Transformer, which can also reduce overfitting, use less GPU memory, and decrease run time.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…To keep our software GOAT manageable for all users, we implement Transformer with only one head and 12 layers. We initialize the GO embedding E G as the pre-trained embedding BERT LAYER12 in [6] which transforms the GO definitions into vectors, where GOs having related definitions will have comparable vectors. Using pre-trained GO embeddings reduces the number of parameters in the Transformer, which can also reduce overfitting, use less GPU memory, and decrease run time.…”
Section: Resultsmentioning
confidence: 99%
“…We next evaluate whether our adaptation of Transformer can learn the co-occurrences of labels. Duong et al [6] noticed in their GO embedding that when one of the child-parent GO labels describes very broad biological events (e.g. low IC), then their vector representations may be far apart.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Several such representations have been proposed, one of the first ones being clusDCA [41], which used random walks to learn features that reflect the GO graph topology. More recent approaches make use of advances in Natural Language Processing (NLP) to learn embeddings that reflect semantic meanings based on the term names and/or descriptions [42]. Theoretical work has shown the utility of embedding graph-structured data (as GO terms are) in hyperbolic rather than Euclidean spaces [43].…”
Section: Function Representationmentioning
confidence: 99%