Recent advances in adaptive immune receptor repertoire sequencing have provided abundant B cell receptor (BCR) sequences under various conditions, including vaccination and disease. However, determining target antigen and epitope specificity of the corresponding antibodies is a major challenge due to their exceptional sequence diversity. Here, we introduce a novel method to cluster antibodies sharing antigenic targets based on their complementarity determining region (CDR) sequences. Using the proposed method, we demonstrate that SARS-CoV-2 spike protein receptor-binding domain (RBD) binders and non-RBD binders from publicly available BCR data were classified correctly, with a cluster purity of 95%. These clusters were then leveraged for annotating unlabeled COVID-19 patient BCR data, enabling the discovery of novel anti-RBD antibodies. We further validated the method by clustering BCR repertoires obtained from single-cell immune profiling of diphtheria-tetanus-pertussis (DTP)-vaccinated donors. Antibody expression and antigen-binding assays demonstrated that the clusters exhibited 96% antigen purity, surpassing the apparent 82% purity achieved by assigning antigens to the same B cells using fluorescently labeled DTP antigen probes. Moreover, antibodies within certain clusters were found to possess neutralizing activity, suggesting that CDR clusters contain epitope-level information. Together, this study offers a simple approach for antigen- and epitope-specific BCR discovery that is reproducible, inexpensive, and applicable to a wide range of antigen targets.
IMPORTANCE
Determining antigen and epitope specificity is an essential step in the discovery of therapeutic antibodies as well as in the analysis adaptive immune responses to disease or vaccination. Despite extensive efforts, deciphering antigen specificity solely from BCR amino acid sequence remains a challenging task, requiring a combination of experimental and computational approaches. Here, we describe and experimentally validate a simple and straightforward approach for grouping antibodies that share antigen and epitope specificities based on their CDR sequence similarity. This approach allows us to identify the specificities of a large number of antibodies whose antigen targets are unknown, using a small fraction of antibodies with well-annotated binding specificities.