Trinucleotide repeat expansion disorders (TREDs) exhibit complex mechanisms of pathogenesis, some of which have been attributed to RNA transcripts of overexpanded CNG repeats, resulting in possibly a gain-of-function. In this paper, we aim to probe the structures of these expanded transcript by analyzing the structural diversity of their conformational ensembles. We used graphs to catalog the structures of an NG-(CNG)16-CN oligomer and grouped them into subensembles based on their characters and calculated the structural diversity and thermodynamic stability for these ensembles using a previously described graph factorization scheme. Our findings show that the generally assumed structure for CNG repeats-a series of canonical helices connected by two-way junctions and capped with a hairpin loop-may not be the most thermodynamically favorable, and the ensembles are characterized by largely open and less structured conformations. Furthermore, a length-dependence is observed for the behavior of the ensembles' diversity as higher-order diagrams are included, suggesting that further studies of CNG repeats are needed at the length scale of TREDs onset to properly understand their structural diversity and how this might relate to their functions.
STATEMENT OF SIGNIFICANCETrinucleotide repeats are DNA satellites that are prone to mutations in the human genome. A family of diverse disorders are associated with an overexpansion of CNG repeats occurring in noncoding regions, and the RNA transcripts of the expanded regions have been implicated as the origin of toxicity. Our understanding of the structures of these expanded RNA transcripts is based on sequences that have limited lengths compared to the scale of the expanded transcripts found in patients. In this paper, we introduce a theoretical method aimed at analyzing the structure and conformational diversity of CNG repeats, which has the potential of overcoming the current length limitations in the studies of trinucleotide repeat sequences.