The recent increase of immunopeptidomic data, obtained by mass spectrometry (MS) or binding assays, opens unprecedented possibilities for investigating endogenous antigen presentation by the highly polymorphic human leukocyte antigen class I (HLA-I) protein. We introduce a flexible and easily interpretable peptide presentation prediction method. We validate its performance as a predictor of cancer neoantigens and viral epitopes and use it to reconstruct peptide motifs presented on specific HLA-I molecules.Recognition of malignant and infected cells by the adaptive immune system requires binding of cytotoxic T-cell receptors to antigens, 8-11-mer peptides presented by the Major Histocompatibilty Complex (MHC) class I coded by HLA-I alleles ( Fig. 1a). Tumour-specific neoantigens are currently sought-after targets for improving cancer immunotherapy [1, 2]. Computational predictions can help select potential neoantigens and accelerate immunogenicity testing. To be useful, these predictions must be specific to each HLA type.State-of-the-art methods [3][4][5], such as NetMHC [6, 7], are based on artificial neural networks trained in a supervised way to predict peptide presentation from known peptide-HLA association. They must be trained on large datasets, and perform best on common alleles. Their accuracy is degraded for rare or little studied HLA-I alleles which are poorly represented in databases. In that case, another approach is to train unsupervised models of presentation from custom elution experiments with little or no information about peptide-HLA association. For instance, MixMHCp [8, 9] can reconstruct, from unannotated peptide sequences, a mixture of generative models -one for each expressed HLA type. However, it makes simplifying assumptions about binding specificity, and is not designed to leverage available (albeit limited) annotation information from the Immune Epitope Database [10] (IEDB) to improve accuracy.We present an alternative method for predicting peptides presented by specific HLA types, which can be trained on custom datasets. It can be applied to patient-or experiment-specific samples and treats common and rare alleles on equal footing, avoiding potential database biases. We use a Restricted Boltzmann Machine (RBM), an unsupervised machine learning scheme that learns probability distributions of sequences given as input [11][12][13]. The RBM estimates presentation scores for each peptide, and can generate candidate presentable peptides. It also provides a lower dimensional representation of peptides with a clear interpretation in terms of associated HLA type, which can be exploited to train classifiers of HLA association from a small number of HLA annotations. The RBM has a simple structure with one hidden layer and weights connecting the input -the peptide sequence -to the hidden layer. The weights, along with the biases acting on both input and hidden units, are learned from a list of presented peptides (see Methods, Fig. 1b, Supplementary Figs. 1, 2, 3). The RBM probability is interprete...