2023
DOI: 10.48550/arxiv.2301.06568
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
29
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(30 citation statements)
references
References 0 publications
1
29
0
Order By: Relevance
“…As a baseline, the receptive field was represented as a one-hot encoding of the amino acids, i.e., no pLM was used. Then, multiple pLMs were considered in this study: ESM-1 small and ESM-1b 31 , ESM-2 34 , ProtT5-XL-U50 32 , CARP-640M 50 , Ankh-base 35 , and Ankh-large 35 (see Suppl. Table 4 for more details).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a baseline, the receptive field was represented as a one-hot encoding of the amino acids, i.e., no pLM was used. Then, multiple pLMs were considered in this study: ESM-1 small and ESM-1b 31 , ESM-2 34 , ProtT5-XL-U50 32 , CARP-640M 50 , Ankh-base 35 , and Ankh-large 35 (see Suppl. Table 4 for more details).…”
Section: Methodsmentioning
confidence: 99%
“…Recently, LMPhosSite utilized protein language models (pLMs) to improve phosphosite prediction by adding single-position embeddings as input features 28 . PLMs are pretrained models that yield enriched, structure-aware sequence representations, instead of merely encoding the amino acid composition of a receptive field in a protein [29][30][31][32][33][34][35] . They have demonstrated value in various tasks, such as few-shot contact map prediction 36 , protein structure prediction 34 , zero-shot mutation impact prediction 37 , or phylogenetic relationship modelling 38 .…”
Section: Introductionmentioning
confidence: 99%
“…TAPE (Rao et al, 2019) employed selfsupervised pretraining on large protein sequences datasets and fine-tuning it on specific tasks to predict protein properties. Ankh (Elnaggar et al, 2023) Combination of sequence and structure. Some other methods merged both sequence and 3D structure information.…”
Section: A1 Related Workmentioning
confidence: 99%
“…TAPE (Rao et al, 2019) employed self-supervised pretraining on large protein sequences datasets and fine-tuning it on specific tasks to predict protein properties. Ankh (Elnaggar et al, 2023) utilized protein sequences as input and generates predictions related to protein structure and function. ProGen2 (Madani et al, 2023) generated protein sequences with protein sequences and controllable tags specifying protein properties.…”
Section: Related Workmentioning
confidence: 99%
“…Some notable contributions include AlphaFold2, RoseTTAFold, ESMFold, OmegaFold, and EMBER2, which have successfully estimated amino acid sequence-to-structure mapping [7, 8, 9, 10, 11]. More generalized models such as ProtBERT, ProtT5, Ankh, and xTrimoPGLM offer highly effective contextualized sequence representations that map intuitively to protein function, gene ontology, physiochemical properties, and more [12, 13, 14]. Interestingly, some pLM projects have opted for different vocabularies outside of the traditional single-letter amino acid code.…”
Section: Introductionmentioning
confidence: 99%