Mutations have been examined in the 1500 interspersed Alk repeats of human DNA that have been sequenced and are nearly full length. There is a set of particular changes at certain pt that rarely occur (termed suppressd changes) compared to the average of (2) that give rise to the sequences and mutation of the inserts in situ after insertion. The special features of the source genes that cause insertion of hundreds of thousands of copies are not known, but the source genes must be transcribed, and the mechanism of insertion is probably retroposition (3). The evolutionary changes in the source genes can be identified because large families of Alu repeats share diagnostic nucleotides at certain positions (2, 4). The relationships of the families of Alu sequences have been reexamined (5), including those judged to be recently inserted (6-9). It appears that several source genes are active at present or have been in the recent past, giving rise to several types of inserted Alu repeats matching each of the sources in sequence (7,9,10).A central aspect of the Alu sequences is that almost all of the source gene sequence has been conserved through the history of the Alu sequences. The diagnostic positions are exceptions, since the nucleotides at these positions changed at some time in the past, giving rise to variant source genes. Some of the variants became predominant new sources of inserted copies, the changed nucleotides were maintained for extended periods of time, and the many copies formed recognizable families of Alu sequences. It is significant that the fully conserved positions of the source genes include most of the CpGs, even though these have changed rapidly after the copies were inserted. As a result of the conservation, there is a large group of positions for which we almost certainly know what the nucleotide was at the time of insertion. The existence of such positions has been previously shown (2), and the principal set of them (not including the CpGs, the diagnostic positions, and a few other positions) is examined in this work. Evidence is presented for their conservation. The 195 chosen positions are termed the CONSBI (conserved before insertion) positions and are shown as uppercase letters in Fig. 1.
Systematic and S aic Chang Identfed by C ring Two Randoml Chosen Sets of Ala SequencesIt is possible to recognize systematic processes as opposed to stochastic events that affect the nucleotides at specific positions by examining the relationship of mutations between subsets of the known sequences. A set of nearly full-length Ala sequences was divided into two equal randomly chosen sets (789 each), and all of the members of each set were compared with the consensus of recently inserted copies (Fig. 1). The divergences at each position were summed, and Fig. 2 is a graph of the fractional divergence of each position in one set plotted against the fractional divergence of the same position in the other set. The correlation of the points along the diagonal is expected, as many of these are diagnos...