We conducted a detailed analysis of duplicate genes in three complete genomes: yeast, Drosophila, and Caenorhabditis elegans. For two proteins belonging to the same family we used the criteria: (1) their similarity is > or =I (I = 30% if L > or = 150 a.a. and I = 0.01n + 4.8L(-0.32(1 + exp(-L/1000))) if L < 150 a.a., where n = 6 and L is the length of the alignable region), and (2) the length of the alignable region between the two sequences is > or = 80% of the longer protein. We found it very important to delete isoforms (caused by alternative splicing), same genes with different names, and proteins derived from repetitive elements. We estimated that there were 530, 674, and 1,219 protein families in yeast, Drosophila, and C. elegans, respectively, so, as expected, yeast has the smallest number of duplicate genes. However, for the duplicate pairs with the number of substitutions per synonymous site (K(S)) < 0.01, Drosophila has only seven pairs, whereas yeast has 58 pairs and nematode has 153 pairs. After considering the possible effects of codon usage bias and gene conversion, these numbers became 6, 55, and 147, respectively. Thus, Drosophila appears to have much fewer young duplicate genes than do yeast and nematode. The larger numbers of duplicate pairs with K(S) < 0.01 in yeast and C. elegans were probably largely caused by block duplications. At any rate, it is clear that the genome of Drosophila melanogaster has undergone few gene duplications in the recent past and has much fewer gene families than C. elegans.
BackgroundProgrammed DNA elimination and reorganization frequently occur during cellular differentiation. Development of the somatic macronucleus in some ciliates presents an extreme case, involving excision of internal eliminated sequences (IESs) that interrupt coding DNA segments (macronuclear destined sequences, MDSs), as well as removal of transposon-like elements and extensive genome fragmentation, leading to 98% genome reduction in Stylonychia lemnae. Approximately 20–30% of the genes are estimated to be scrambled in the germline micronucleus, with coding segment order permuted and present in either orientation on micronuclear chromosomes. Massive genome rearrangements are therefore critical for development.Methodology/Principal FindingsTo understand the process of DNA deletion and reorganization during macronuclear development, we examined the population of DNA molecules during assembly of different scrambled genes in two related organisms in a developmental time-course by PCR. The data suggest that removal of conventional IESs usually occurs first, accompanied by a surprising level of error at this step. The complex events of inversion and translocation seem to occur after repair and excision of all conventional IESs and via multiple pathways.Conclusions/Significance This study reveals a temporal order of DNA rearrangements during the processing of a scrambled gene, with simpler events usually preceding more complex ones. The surprising observation of a hidden layer of errors, absent from the mature macronucleus but present during development, also underscores the need for repair or screening of incorrectly-assembled DNA molecules.
The macronuclear genomes of spirotrichous ciliates are almost entirely polyploid, single-gene chromosomes ("nanochromosomes"). We recently performed a pilot genome project for a member of this group, Oxytricha trifallax ( Sterkiella histriomuscorum), in which approximately 2000 nanochromosomes were cloned at random and end-sequenced. Here we describe the global properties of the coding regions predicted for these molecules, including nucleotide composition, codon usage, and intron properties. In identifying splice donor, acceptor and branch sites, we found that longer introns in Oxytricha have a stronger signal at the donor site than do smaller introns, as has been found for Caenorhabditis elegans and Drosophila, despite the overall small size of the introns. A systematic search for multi-gene chromosomes identified 11 candidate nanochromosomes. We compare the results from this large dataset with those obtained from earlier studies and with statistics recorded from ciliates and other eukaryotes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.