2020
DOI: 10.48550/arxiv.2007.06225
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

3
212
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 101 publications
(215 citation statements)
references
References 0 publications
3
212
0
Order By: Relevance
“…• However, the same is not true for pretraining with a large-scale protein sequence prediction task (Elnaggar et al, 2020). Pretraining with this task in fact mostly deteriorates the performance on the downstream semantic parsing tasks, suggesting, contrary to some recent claims (Lu et al, 2021), that pretrained representations do not transfer universally and that there has to be a certain kind and degree of similarity between the pretraining and downstream tasks for successful transfer.…”
mentioning
confidence: 79%
“…• However, the same is not true for pretraining with a large-scale protein sequence prediction task (Elnaggar et al, 2020). Pretraining with this task in fact mostly deteriorates the performance on the downstream semantic parsing tasks, suggesting, contrary to some recent claims (Lu et al, 2021), that pretrained representations do not transfer universally and that there has to be a certain kind and degree of similarity between the pretraining and downstream tasks for successful transfer.…”
mentioning
confidence: 79%
“…As we mentioned in related work, pre-trained LM, such as SeqVec [23] and ProtBert [24], already proved their performance to capture rudimentary features of proteins such as secondary structures, biological activities, and functions [22,21]. Especially, it was shown that SeqVec [23] is better than ProtBert [24] to extract high-level features related functions for PFP [5]. SeqVec [23] is utilized as a protein sequence encoder.…”
Section: Protein Sequence Encodingmentioning
confidence: 96%
“…With the advent of transformers [19], which are attention-based model, in Natural Language Processing (NLP), various attention-based LMs were applied to protein sequence embedding [20,21,22,23,24]. As protein sequences can be considered as sentences, these learned the relationship between amino acids constituting the sequence and learned contextual information.…”
Section: Protein Sequence Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare with CNN-based models and GNN-based models which learn the protein annotations using 3D structures from scratch. For fair comparison, we do not include LSTM-based or transformer-based methods, as they all pre-train their models using millions of protein sequences and only fine-tune their models on 3D structures (Bepler & Berger, 2019;Alley et al, 2019;Rao et al, 2019;Strodthoff et al, 2020;Elnaggar et al, 2020).…”
Section: Model Quality Assessmentmentioning
confidence: 99%