2021
DOI: 10.1101/2021.11.10.468064
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deciphering the language of antibodies using self-supervised learning

Abstract: An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses, and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our fundamental understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. One of the grand challenges of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here we present an antibody-specific language model, AntiBER… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(16 citation statements)
references
References 48 publications
0
16
0
Order By: Relevance
“…Therefore, our framework allows for optimizing the generative output of deep learning approaches in future benchmarking studies. 11 , 62 , 63 In addition, further research is needed to understand the relationship between signal (pattern) complexity, encoding and embedding, 20 and the number of sequences needed for achieving satisfactory generation quality.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, our framework allows for optimizing the generative output of deep learning approaches in future benchmarking studies. 11 , 62 , 63 In addition, further research is needed to understand the relationship between signal (pattern) complexity, encoding and embedding, 20 and the number of sequences needed for achieving satisfactory generation quality.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, the models are not evaluated in terms of which downstream tasks can be applied via transfer learning. Recently, attempts appear, which utilize large language models in repertoire analysis (133)(134)(135)(136)(137)180). In AntiBERTa (137), fine-tuning for a downstream task is also investigated.…”
Section: Discussionmentioning
confidence: 99%
“…Some papers perform pre-training on their own on the repertoire sequencing dataset. In Leem et al (136), each amino acid in a TCR is treated as a word, and a TCR is treated as a sentence to pre-train a BERT language model (AntiBERTa). AntiBERTa achieved a higher ROC-AUC in a paratope prediction task than other tools.…”
Section: Embedding Methods Based On Representation Learningmentioning
confidence: 99%
“…Accumulating evidence reveals that many BERT models can be pruned without impacting their predictive prowess, i.e., most heads in the same layers converge to a similar attention pattern, and thus many layers can be consolidated into a single head. 203 , 204 In biology, attention layers of transformer-based models, including BERT, have been shown to capture long-range interaction in protein and antibody folding by folding AAs that are distant in 1D sequence but spatially adjacent in the 3D structure, to identify active sites and to capture the hierarchy of complex biophysical properties with increasing layer depths 100 , 205 – properties that are also critical for antibody-antigen binding. Nevertheless, as in NLP, these models remain susceptible to overparameterization and lack of interpretability.…”
Section: Learnability Of Antibody–antigen Bindingmentioning
confidence: 99%