One Sentence Summary: The combination of advanced tools from natural language processing and large-scale dictionaries of T cell receptors and their target peptide precisely predicts whether a T cell would bind a specific target.
AbstractThe T cell repertoire is composed of T cell receptors (TCR) selected by their cognate MHCpeptides and naive TCR that do not bind known peptides. While the task of distinguishing a peptide-binding TCR from a naive TCR unlikely to bind any peptide can be performed using sequence motifs, distinguishing between TCRs binding different peptides requires more advanced methods. Such a prediction is the key for using TCR repertoires as disease-specific biomarkers. We here used large scale TCR-peptide dictionaries with state-of-the-art natural language processing (NLP) methods to produce ERGO (pEptide tcR matchinG predictiOn), a highly specific classifier to predict which TCR binds to which peptide. We successfully employed ERGO for two related tasks: discrimination between peptide binding and naive TCRs and the more complicated task of distinguishing between TCRs that bind different peptides. We show that ERGO significantly outperforms all current methods for classification of TCRs that bind peptides, but more importantly can distinguish the specific target of a TCR among a large set of peptides. The software implementation and data sets are available at: https://github.com/IdoSpringer/ERGO