We review methods in the study of nucleotide correlation in DNA sequence, and demonstrate two basic properties of the correlation through statistical analysis, namely, the short-range dominance of nucleotide correlation in most DNA sequences and the coarse-grained evolutionary dependence of the short-range correlation in coding sequences. A corresponding evolutionary mechanism is suggested. By the use of spectral analysis a large inhomogeneity in long-range base correlations for different sequences is indicated. Some results on three-dimensional DNA walks are reported. The linguistic differences between coding and noncoding sequences are also indicated.
The non-neighbor interactions between base-pairs were taken into account to calculate the angular parameters (Omega, rho and tau) describing the orientation of successive base-pair planes and the translation parameters (D(y)) along the long axis of base-pair steps for 36 independent tetramers. A statistical mechanical model was proposed to predict the DNA flexibility that is mainly related to the thermal fluctuations at individual base-pair steps. The DNA flexibility can be described by the root-mean-square deviation of the end-to-end distance of DNA helical structure. The present model was then used to investigate the extreme flexible pattern in prokaryotic and eukaryotic promoter sequences. The results demonstrated several extreme flexible regions related to functionally important elements exist both in prokaryotic promoters and in eukaryotic promoters, DNA flexibility and AT content are highly correlated. The probabilities finding flexibility pattern in promoter sequences were also estimated statistically. The biological implications were discussed briefly.
Empirical rules based on tetranucleotide parameters were presented to predict the structural parameters twist (Omega), roll (rho), tilt (tau) and slide (D(y)). A statistical mechanical model was used to analyze the flexibility of the Escherichia coli genome. The replication terminus region displayed a low level of flexibility. A strong correlation can be seen between G+C content and flexibility. Average flexibilities in the coding regions were found to be significantly larger than those in non-coding regions. The flexible characteristics in the 5'-neighborhood of the coding regions and in three class sigma promoter sequences in the E. coli genome were also analyzed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.