Some 200 different SS rRNA sequences from eubacteria, chloroplasts, mitochondria, archaebacteria, and eukaryotes were analyzed for evolutionary kinship relationships and associated sequential features. Group-specific occupation schemes for the 149 positions of an overall alignment were established. Eubacterial, archaebacterial, and intermediate occupation schemes all yield a strongly biased base triplet pattern in one of the three possible reading frames strongest for eubacterial, chloroplastic, and archaebacterial, but still detectable for mitochondrial and eukaryotic cytoplasmic sequences. The frequency of triplets decays in the order RNY > RNR > YNY > YNR; R being a purine (guanine or adenine), Y is a pyrimidine (cytosine or uracil), and N is any base. A strong preference for guanine or cytosine was found in all triplet positions. The effects show no exceptions and are clearly above the level of statistical fluctuations.In this paper, we report a comparative study of the "200 5S rRNA sequences known today. Preliminary analysis of some mainly eubacterial 5S rRNA sequences (1) revealed a clear bias for the presence of a triplet pattern 5' RNY 3' where R is a purine (guanine or adenine), Y is a pyrimidine (cytosine or uracil), and N is any nucleotide. A similar phenomenon was found previously for tRNA sequences (2, 3). While the reading frame for tRNAs is defined through the position of the anticodon and the common assignment of the 5' terminus, 5S rRNA sequences vary in length and therefore had to be tested for the three possible reading frames. For each individual sequence, the RNY bias shows up in only one of the frames, varying with respect to the 5'-terminal position; in the two corresponding alternative frames, a weaker YNR bias always appears. We conjectured and proved with this study that the variable reading frames can be synchronized through proper alignment.Our analysis is essentially based on data from two sources. Most of the sequences are compiled in an alignment produced by Erdmann et al. (4), of which we used only nondegenerate sequences in order' to avoid statistical bias.These comprise 115 eukaryotic, 37 eubacterial, 9 chloroplastic, and 1 mitochondrial sequence. In addition, 17 archaebacterial sequences were kindly provided by G. E. Fox, C. R.Woese, and K. R. Luehrsen (personal communication).All data were filed and processed on a Philips P2000 M computer so as to yield alignment, common reading frames, tree topology, base composition, and periodic patterns.
Alignment and Common Reading FramesAny comparative analysis of base composition and pattern structures is critically dependent on proper alignment of the sequences. There are sufficient homologies and invariances distributed along the entire sequence that assignment of positions does not pose any serious problem for an overwhelming majority of sequences. Minor uncertainties remain for only a few positions in the mitochondrial and some archaebacterial sequences.Alignment was greatly aided by the determination of master sequences, wh...