Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification

Moris, Pieter; Pauw, Joey De; Postovskaya, Anna; Gielis, Sofie; Neuter, Nicolas De; Bittremieux, Wout; Ogunjimi, Benson; Laukens, Kris; Meysman, Pieter

doi:10.1093/bib/bbaa318

Cited by 120 publications

(225 citation statements)

References 35 publications

Supporting

Mentioning

183

Contrasting

Order By: Relevance

“…This can be achieved by shuffling the sequences, thereby associating TCRs with epitopes that they have not been shown to bind. Due to the low probability of a randomly drawn TCR binding a specific epitope, this manner of generating negative samples is established in the field ( Fischer et al, 2020 ; Moris et al, 2020 ). It has also been shown to limit overestimation of performances in comparison to adding additional naive TCR sequences from other sources ( Moris et al, 2020 ).…”

Section: Methodsmentioning

confidence: 99%

“…Due to the low probability of a randomly drawn TCR binding a specific epitope, this manner of generating negative samples is established in the field ( Fischer et al, 2020 ; Moris et al, 2020 ). It has also been shown to limit overestimation of performances in comparison to adding additional naive TCR sequences from other sources ( Moris et al, 2020 ). Furthermore, by shuffling the pairing of TCRs and epitopes, we can match the number of negative examples to that of positive examples for each TCR, avoiding unbalanced datasets.…”

Section: Methodsmentioning

confidence: 99%

“…These have the potential to predict binding of any TCR–epitope pair, opening the door to the development of models that can generalize to both, unseen TCRs and epitopes. Current models show moderate performance on test data containing epitopes already encountered in training, but cannot extrapolate to unseen epitopes ( Jurtz et al, 2018 ; Moris et al, 2020 ; Springer et al, 2019 ).…”

Section: Introductionmentioning

confidence: 98%

See 2 more Smart Citations

TITAN: T-cell receptor specificity prediction with bimodal attention networks

2021

View full text Add to dashboard Cite

Motivation The activity of the adaptive immune system is governed by T-cells and their specific T-cell receptors (TCR), which selectively recognize foreign antigens. Recent advances in experimental techniques have enabled sequencing of TCRs and their antigenic targets (epitopes), allowing to research the missing link between TCR sequence and epitope binding specificity. Scarcity of data and a large sequence space make this task challenging, and to date only models limited to a small set of epitopes have achieved good performance. Here, we establish a k-nearest-neighbor (K-NN) classifier as a strong baseline and then propose Tcr epITope bimodal Attention Networks (TITAN), a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes. Results By encoding epitopes at the atomic level with SMILES sequences, we leverage transfer learning and data augmentation to enrich the input data space and boost performance. TITAN achieves high performance in the prediction of specificity of unseen TCRs (ROC-AUC 0.87 in 10-fold CV) and surpasses the results of the current state-of-the-art (ImRex) by a large margin. Notably, our Levenshtein-based K-NN classifier also exhibits competitive performance on unseen TCRs. While the generalization to unseen epitopes remains challenging, we report two major breakthroughs. First, by dissecting the attention heatmaps, we demonstrate that the sparsity of available epitope data favors an implicit treatment of epitopes as classes. This may be a general problem that limits unseen epitope performance for sufficiently complex models. Second, we show that TITAN nevertheless exhibits significantly improved performance on unseen epitopes and is capable of focusing attention on chemically meaningful molecular structures. Availability and implementation The code as well as the dataset used in this study is publicly available at https://github.com/PaccMann/TITAN. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

TITAN: T-cell receptor specificity prediction with bimodal attention networks

2021

View full text Add to dashboard Cite

show abstract

“…Together this information shapes the foundation for AIRR-based diagnostics 6,[10][11][12][13] . Similarly, s equence -based prediction of antigen and epitope binding is of fundamental importance for AIR-based therapeutics discovery and engineering [14][15][16][17][18][19][20][21][22][23][24] . In this manuscript, the term AIRR signifies both AIRs and AIRRs (a collection of AIRs) if not specified otherwise.…”

Section: Introductionmentioning

confidence: 99%

“…Briefly, (i) ~10 8 -10 10 distinct AIRs exist in a given individual at any one time [29][30][31] , with little overlap among individuals, necessitating encodings that allow detection of predictive patterns. These shared patterns may correspond to full-length AIRs 6 or subsequences 15 alternative representations thereof 11,12,16,17,21,[32][33][34] . (ii) In repertoire-based ML, the patterns relevant to any immune state may be as rare as one antigen-binding AIR per million lymphocytes in a repertoire 35 translating into a very low rate of relevant sequences per repertoire (low witness rate) 11,36,37 .…”

Section: Introductionmentioning

confidence: 99%

immuneML: an ecosystem for machine learning analysis of adaptive immune receptor repertoires

Pavlović

Scheffer

Motwani

et al. 2021

Preprint

View full text Add to dashboard Cite

Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

show abstract