While many genetic causes of movement disorders have been identified, modifiers of disease expression are largely unknown. X-linked dystonia-parkinsonism (XDP) is a neurodegenerative disease caused by a SINE-VNTR-Alu(AGAGGG)n retrotransposon insertion in TAF1, with an expanded (AGAGGG)n. Repeat length and variants in MSH3 and PMS2, explain ∼65% of the variance in age at onset (AAO) in XDP. However, additional genetic modifiers are conceivably at play in XDP, such as repeat interruptions.
Long-read Nanopore sequencing of PCR amplicons from XDP patients (n = 202) was performed to assess potential repeat interruption and instability. Repeat-primed PCR and Cas9-mediated targeted enrichment confirmed the presence of identified divergent repeat motifs.
In addition to the canonical pure SINE-VNTR-Alu-5’-(AGAGGG)n, we observed a mosaic of divergent repeat motifs that polarized at the beginning of the tract, where the divergent repeat interruptions varied in motif length by having one, two, or three nucleotides fewer than the hexameric motif, distinct from interruptions in other disease-associated repeats, which match the lengths of the canonical motifs. All divergent configurations occurred mosaically and in two investigated brain regions (basal ganglia, cerebellum) and in blood-derived DNA from the same patient. The most common divergent interruption was AGG [5’-SINE-VNTR-Alu(AGAGGG)2AGG(AGAGGG)n], similar to the pure tract, followed by AGGG [5’-SINE-VNTR-Alu(AGAGGG)2AGGG(AGAGGG)n], at median frequencies of 0.425 (IQR: 0.42-0.43) and 0.128 (IQR: 0.12-0.13), respectively. The mosaic AGG motif was not associated with repeat number (estimate=-3.8342, p = 0.869). The mosaic pure tract frequency was associated with repeat number (estimate = 45.32, p = 0.0441) but not AAO (estimate=-41.486, p = 0.378). Importantly, the mosaic frequency of the AGGG negatively correlated with repeat number after adjusting for age at sampling (estimate=-161.09, p = 3.44 × 10−5). When including the XDP-relevant MSH3/PMS2 modifier SNPs into the model, the mosaic AGGG frequency was associated with AAO (estimate = 155.1063, p = 0.047), however, the association dissipated after including the repeat number (estimate=-92.46430, p = 0.079).
We reveal novel mosaic Divergent Repeat Interruptions affecting both motif Length and Sequence (DRILS) of the canonical motif polarized within the expanded SINE-VNTR-Alu(AGAGGG)n repeat of TAF1. Our study illustrates: 1) the importance of somatic mosaic genotypes; 2) the biological plausibility of multiple modifiers (both germline and somatic) that can have additive effects on repeat instability; 3) that these variations may remain undetected without assessment of single molecules.