Although the trinucleotide repeats are present in the exons of numerous human genes, the allele distribution is not well known, and the factors responsible for their intergenic and intragenic variability are not well understood. We have analyzed the length and sequence variation within the most commonly occurring CAG and CTG repeats in a large number of human genes selected to contain the longest reported repeat tracts. Our study revealed that in genes other than those implicated in the Triplet Repeat Expansion Diseases (TREDs), the very long and highly polymorphic repeats are rather infrequent. The length of pure repeat tract in the most frequent allele was found to correlate well with the rate of the repeat length polymorphism, and CAA triplets were shown to be the most frequent CAG repeat interruptions. As both the CAG and CAA triplets code for glutamine, our results may suggest that the selective pressure disfavors the long uninterrupted CAG repeats in genes and transcripts but not the long normal polyglutamine tracts in proteins. This may indicate that hairpin structures formed in ssDNA and RNA by long pure CAG repeats would be selected against as they may impede normal cellular processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.