BackgroundNext-generation sequencing (NGS) of antibody variable regions has emerged as a powerful tool in systems immunology by providing quantitative molecular information on polyclonal humoral immune responses. Reproducible and robust information on antibody repertoires is valuable for basic and applied immunology studies: thus, it is essential to establish the reliability of antibody NGS data.ResultsWe isolated RNA from antibody-secreting cells (ASCs) from either 1 mouse or a pool of 9 immunized mice in order to simulate both normal and high diversity populations. Next, we prepared three technical replicates of antibody libraries by RT-PCR from each diversity scenario, which were sequenced using the Illumina MiSeq platform resulting in >106 250 bp paired-end reads per replicate. We then assessed the robustness of antibody repertoire data based on clonal identification defined by amino acid sequence of either full-length VDJ region or the complementarity determining region 3 (CDR3). Leveraging modeling approaches adapted from mathematical ecology, we found that in either diversity scenario both CDR3 and VDJ detection nears completeness indicating deep coverage of ASC repertoires. Additionally, we defined reliability thresholds for accurate quantification and ranking of CDR3s and VDJs. Importantly, we show that both factors(i) replicate sequencing and (ii) sequencing depth–are crucial for robust CDR3 and VDJ detection and ranking.ConclusionsIn summary, we established widely applicable experimental and computational guidelines for robust antibody NGS and analysis, which will help advance systems immunology studies related to the quantitative profiling of antibody responses following infection and vaccination.Electronic supplementary materialThe online version of this article (doi:10.1186/s12865-014-0040-5) contains supplementary material, which is available to authorized users.
Adaptive immunity is driven by the ability of lymphocytes to undergo V(D)J recombination and generate a highly diverse set of immune receptors (B cell receptors/secreted antibodies and T cell receptors) and their subsequent clonal selection and expansion upon molecular recognition of foreign antigens. These principles lead to remarkable, unique and dynamic immune receptor repertoires 1 . Deep sequencing provides increasing evidence for the presence of commonly shared (convergent) receptors across individual organisms within one species 2-4 . Convergent selection of specific receptors towards various antigens offers one explanation for these findings. For example, single cases of convergence have been reported in antibody repertoires of viral infection or allergy 5-8 . Recent studies demonstrate that convergent selection of sequence motifs within T cell receptor (TCR) repertoires can be identified on an even wider scale 9,10 . Here we report that there is extensive convergent selection in antibody repertoires of mice for a range of protein antigens and immunization conditions. We employed a deep learning approach utilizing variational autoencoders (VAEs) to model the underlying process of B cell receptor (BCR) recombination and assume that the data generation follows a Gaussian mixture model (GMM) in latent space. This provides both a latent embedding and cluster labels that group similar sequences, thus enabling the discovery of a multitude of convergent, antigen-associated sequence patterns. Using a linear, one-versus-all support vector machine (SVM), we confirm that the identified sequence patterns are predictive of antigenic exposure and outperform predictions based on the occurrence of public clones. Recombinant expression of both natural and in silico-generated antibodies possessing convergent patterns confirms their binding specificity to target antigens. Our work highlights to which extent convergence in antibody repertoires can occur and shows how deep learning can be applied for immunodiagnostics and antibody discovery and engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.