Semi-supervised deconvolution and integration of multi-allelic MHC peptidome data allows for improved MHC antigen presentation and T cell epitope predictions.
AbstractThe set of peptides presented on a cell's surface by MHC molecules is known as the immunopeptidome. Current mass spectrometry technologies allow for identification of large peptidomes, and studies have proven these data to be a rich source of information for learning the rules of MHC-mediated antigen presentation. Immunopeptidomes are usually poly-specific, containing multiple sequence motifs matching the MHC molecules expressed in the system under investigation. Motif deconvolution -the process of associating each ligand to its presenting MHC molecule(s)-is therefore a critical and challenging step in the analysis of MS-eluted MHC ligand data.Here, we describe NNAlign_MA, a computational method designed to address this challenge and fully benefit from large, poly-specific data sets of MS-eluted ligands. NNAlign_MA simultaneously performs the tasks of i) clustering peptides into individual specificities; ii) automatic annotation of each cluster to an MHC molecule; and iii) training of a prediction model covering all MHCs present in the training set.NNAlign_MA was benchmarked on large and diverse datasets, covering class I and class II data. In all cases, the method was demonstrated to outperform state-of-the-art methods, effectively expanding the coverage of alleles for which accurate predictions can be made, resulting in improved identification of both eluted ligands and T cell epitopes. Given its high flexibility and ease of use, we expect NNAlign_MA to serve as an effective tool to increase our understanding of the rules of MHC antigen presentation and guide the development of novel T cell-based therapeutics.Due to the essential role of the MHC in defining immune responses, large efforts have been dedicated to understanding the rules that shape the immunopeptidome, as well as its alterations in disease -either as a result of pathogen infection or cancerous mutation (1). A crucial step towards defining the immunopeptidome of an individual is the characterization of the binding preferences of MHC molecules. The peptide-binding domain of MHC molecules consists of a groove, with specific amino acid preferences at different positions. MHC class I, by and large, loads peptides between eight and thirteen residues long (2, 3). MHC class II molecules have an open binding groove at both ends and can bind much longer peptides, and even whole proteins (4, 5).Peptide-MHC binding affinity (BA) assays represented the first attempts of studying binding preferences of different MHC molecules in vitro (6, 7). However, BA characterization ignores many in vivo antigen processing and presentation features, such as protein internalization, protease digestion, peptide transport, peptide trimming, and the role of different chaperones involved in the folding of the pMHC complex (8). Further, BA assays most often are conducted one peptide at a time, thus becoming costly, ...