Tandem repeats in GenBank primate nucleotide sequences annotated as protein coding regions are analyzed. It is found that only trinucleotide repeats show repeat enrichment well above the threshold of statistical significance. The statistics are improved by a simultaneous search for repeats on both the amino acid and nucleotide levels. The results of the analyses of natural sequences are interpreted by comparing them with the results of the computer simulation of the model dedicated to protein coding regions. According to the simulation results, a limited set of trinucleotides, that is, cgg, ccg, cag, and gaa repeats coding for polyalanine, polyglycine, polyproline, polyglutamine, and polylysine are prone to proliferation. It is also found that within the repeat regions slippage is more frequent by a factor of 10 than point mutations, whereas the ratio of silent versus recognizable point mutations is approximately the same as elsewhere in coding regions. The trinucleotide repeats cover slightly more than 0.3% of the protein coding regions of genes.Tandem repeats with short (1-6 bp) monomer units, also called microsatellites or simple sequence repeats, exist in noncoding genomic regions as well as in regions coding for proteins and structural nucleic acids. In the human genome (Genome Sequencing Consortium 2001) approximately 2% of the nucleotide sequences are in the form of tandem repeats in which the length of the repeat unit is between 1 and 11 bp. The functional role of tandem repeats is poorly understood. They are, however, known to be involved in several genetic diseases and they can be successfully used as the genetic markers. To shed more light on the character of these short sequence repeats, it is worthwhile to determine the type and content of short sequence repeats in coding regions and to make a quantitative comparison with the situation in noncoding regions. Three research groups are involved in just such an effort (Toth et al. 2000;Metzgar et al. 2000;Field and Wills 1998). They have shown that short tandem repeats are much more numerous in noncoding regions than in protein coding regions, and that in coding regions, trinucleotide and hexanucleotide repeats are more frequent than mononucleotide, dinucleotide, tetranucleotide, and pentanucleotide repeats. This latter finding can be explained in terms of the complete impairing of the protein function by the frameshift mutation that takes place when an exon receives insertion or deletion of a segment whose length is not a multiple of codon length.The major problem with analyzing coding regions is the scarcity of annotated sequences. To better determine the content and the characteristics of microsatellites in exons we have made two kinds of improvements in the analyses: (1) The counts of tandem repeats in exons were performed on the largest data set possible; and (2) analyses of the counts were interpreted according to a realistic expectation model supported by a computer simulation.Concerning the first point, it is clear (especially now, because the r...
Short tandem repeats (STRs) are subjected to two kinds of mutational modifications: point mutations and replication slippages. The latter is found to be the more frequent cause of STR modifications, but a satisfactory quantitative measure of the ratio of the two processes has yet to be determined. The comparison of entire genome sequences of closely enough related species enables one to obtain sufficient statistics by counting the differences in the STR regions. We analyzed human-chimpanzee DNA sequence alignments to obtain the counts of point mutations and replication slippage modifications. The results were compared with the results of a computer simulation, and the parameters quantifying the replication slippage probability as well as the probabilities of point mutations within the repeats were determined. It was found that within the STRs with repeated units consisting of one, two or three nucleotides, point mutations occur approximately twice as frequently as one would expect on the basis of the 1.2% difference between the human and chimpanzee genomes. As expected, the replication slippage probability is negligible below a 10-bp threshold and grows above this level. The replication slippage events outnumber the point mutations by one or two orders of magnitude, but are still lower by one order of magnitude relative to the mutability of the markers that are used for genotyping purposes.
PACS. 05.45 -Theory and models of chaotic systems. PACS. 87.10 -General, theoretical, and mathematical biophysics (inc. logic of biosystems, quantum biology and relevant aspects of thermodynamics, information theory, cybernetics, and bionics).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.