We used DNA sequencing and gel blot surveys to assess the integrity of the chloroplast gene infA , which codes for translation initiation factor 1, in Ͼ 300 diverse angiosperms. Whereas most angiosperms appear to contain an intact chloroplast infA gene, the gene has repeatedly become defunct in ف 24 separate lineages of angiosperms, including almost all rosid species. In four species in which chloroplast infA is defunct, transferred and expressed copies of the gene were found in the nucleus, complete with putative chloroplast transit peptide sequences. The transit peptide sequences of the nuclear infA genes from soybean and Arabidopsis were shown to be functional by their ability to target green fluorescent protein to chloroplasts in vivo. Phylogenetic analysis of infA sequences and assessment of transit peptide homology indicate that the four nuclear infA genes are probably derived from four independent gene transfers from chloroplast to nuclear DNA during angiosperm evolution. Considering this and the many separate losses of infA from chloroplast DNA, the gene has probably been transferred many more times, making infA by far the most mobile chloroplast gene known in plants.
INTRODUCTIONMany genes have been lost from the chloroplast genome during plant and algal evolution. Most of these losses occurred in the murky interval between the original endosymbiosis of a cyanobacterium (with perhaps 2000 proteincoding genes) and the last common ancestor of all existing chloroplast genomes (with ف 210 protein-coding genes; . Many other genes were lost during the early evolution of photosynthetic eukaryotes, often in parallel in different algal lineages, and some of these losses were the result of gene transfers to the nuclear genome . During the evolution of land plants, relatively few changes occurred to the set of genes found in chloroplast DNA (cpDNA) Palmer and Delwiche, 1998). Nonetheless, the most recent changes are likely to provide the most information about the evolutionary mechanisms involved.Among the six completely sequenced chloroplast genomes from angiosperms (excluding the nonphotosynthetic plant Epifagus virginiana ; Wolfe et al., 1992a), 74 proteincoding genes are held in common and an additional five are present in only some species. These five genes are accD , ycf1 , and ycf2 (pseudogenes in rice and maize; Hiratsuka et al., 1989;Maier et al., 1995), rpl23 (pseudogene in spinach; Thomas et al., 1988), and infA (pseudogene in tobacco, Arabidopsis, and Oenothera elata ; Shinozaki et al., 1986;Wolfe et al., 1992b;Sato et al., 1999; Hupfer et al., 2000). Other chloroplast gene losses in angiosperms that have been confirmed by sequencing include rpl22 , rps16 , and ycf4 (open reading frame 184), all of which have been lost in 1 To whom correspondence should be addressed. E-mail (in Dublin) khwolfe@tcd.ie; fax 353-1-6798558.
646The Plant Cell some or all legumes (Gantt et al., 1991;Nagano et al., 1991; Doyle et al., 1995; K.H. Wolfe, unpublished data), and ycf2 and ndhF , both of which have been lost ...