B cells undergo somatic hypermutation (SHM) of the Immunoglobulin (Ig) variable region to generate high-affinity antibodies. SHM relies on the activity of activation-induced deaminase (AID), which mutates C>U preferentially targeting WRC (W=A/T, R=A/G) hotspots. Downstream mutations at WA Polymerase h hotspots contribute further mutations. Computational models of SHM can describe the probability of mutations essential for vaccine responses. Previous studies using short subsequences (k-mers) failed to explain divergent mutability for the same k-mer. We developed the DeepSHM (Deep learning on SHM) model using k-mers of size 5-21, improving accuracy over previous models. Interpretation of DeepSHM identified an extended WWRCT motif with particularly high mutability. Increased mutability was further associated with lower surrounding G content. Our model also discovered a conserved AGYCTGGGGG (Y=C/T) motif within FW1 of IGHV3 family genes with unusually high T>G substitution rates. Thus, a wider sequence context increases predictive power and identifies features that drive mutational targeting.
The somatic hypermutation (SHM) of Immunoglobulin (Ig) genes is a key process during antibody affinity maturation in B cells. The mutagenic enzyme activation induced deaminase (AID) is required for SHM and has a preference for WRC hotspots in DNA. Error-prone repair mechanisms acting downstream of AID introduce further mutations, including DNA polymerase eta (Polη), part of the non-canonical mismatch repair pathway (ncMMR), which preferentially generates mutations at WA hotspots. Previously proposed mechanistic models lead to a variety of predictions concerning interactions between hotspots, for example, how mutations in one hotspot will affect another hotspot. Using a large, high-quality, Ig repertoire sequencing dataset, we evaluated pairwise correlations between mutations site-by-site using an unbiased measure similar to mutual information which we termed “mutational association” (MA). Interactions are dominated by relatively strong correlations between nearby sites (short-range MAs), which can be almost entirely explained by interactions between overlapping hotspots for AID and/or Polη. We also found relatively weak dependencies between almost all sites throughout each gene (longer-range MAs), although these arise mostly as a statistical consequence of high pairwise mutation frequencies. The dominant short-range interactions are also highest within the most highly mutating IGHV sub-regions, such as the complementarity determining regions (CDRs), where there is a high hotspot density. Our results suggest that the hotspot preferences for AID and Polη have themselves evolved to allow for greater interactions between AID and/or Polη induced mutations.
B-cells undergo somatic hypermutation (SHM) of the Immunoglobulin (Ig) variable region to generate high-affinity antibodies. SHM relies on the activity of activation-induced deaminase (AID), which mutates C>U preferentially targeting WRC (W=A/T, R=A/G) hotspots. Downstream mutations at WA Polymerase η hotspots contribute further mutations. Computational models of SHM can describe the probability of mutations essential for vaccine responses. Previous studies using short subsequences (k-mers) failed to explain divergent mutability for the same k-mer. We developed the DeepSHM (Deep learning on SHM) model using k-mers of size 5-21, improving accuracy over previous models. Interpretation of DeepSHM identified an extended DWRCT (D=A/G/T) motif with particularly high mutability. Increased mutability was further associated with lower surrounding G content. Our model also discovered a conserved AGYCTGGGGG (Y=C/T) motif within FW1 of IGHV3 family genes with unusually high T>G substitution rates. Thus, a wider sequence context increases predictive power and identifies novel features that drive mutational targeting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.