Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre-and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.The sequencing of the human genome and the genomes of dozens of other metazoan species has intensified the need for systematic methods to extract biological information directly from DNA sequence. Comparative genomics has emerged as a powerful methodology for this endeavour 1,2 . Comparison of few (two-four) closely related genomes has proven successful for the discovery of protein-coding genes 3-5 , RNA genes 6,7 , miRNA genes 8-11 and catalogues of regulatory elements 3,4,12-14 . The resolution and discovery power of these studies should increase with the number of genomes [15][16][17][18][19][20] , in principle enabling the systematic discovery of all conserved functional elements.The fruitfly Drosophila melanogaster is an ideal system for developing and evaluating comparative genomics methodologies. Over the past century, Drosophila has been a pioneering model in which many of the basic principles governing animal development and population biology were established 21 . In the past decade, the genome sequence of D. melanogaster provided one of the first systematic views *These authors contributed equally to this work. {Lists of participants and affiliations appear at the end of the paper.
The invected and engrailed genes are juxtaposed in the Drosophila genome and are closely related in sequence and pattern of expression. The structure of the most abundant invected transcript was defined by obtaining the full-length cDNA sequence and by S1 nuclease sensitivity and primer extension studies; a partial sequence of the invected gene was determined; and the developmental profile of invected expression was characterized by Northern analysis and by in situ localization. The invected gene, like the engrailed gene, is expressed in the embryonic and larval cells of the posterior developmental compartments and in the embryonic hindgut, clypeolabrum, and nervous system. Like the engrailed gene, the invected gene can encode a protein of approximately 60 kD that contains a homeo box near its carboxyl terminus; indeed, a sequence of 117 amino acids in the carboxy-terminal region of both proteins is almost identical. The developmental role of the invected gene is not known.
Comprehensive knowledge of proteome complexity is crucial to understanding cell function. Amino termini of yeast proteins were identified through peptide mass spectrometry on glutaraldehyde-treated cell lysates as well as a parallel assessment of publicly-deposited spectra. An unexpectedly large fraction of detected amino-terminal peptides (35%) mapped to translation initiation at AUG codons downstream of the annotated start codon. Many of the implicated genes have suboptimal sequence contexts for translation initiation near their annotated AUG, and their ribosome profiles show elevated tag densities consistent with translation initiation at downstream AUGs as well as their annotated AUGs. These data suggest that a significant fraction of the yeast proteome derives from initiation at downstream AUGs, increasing significantly the repertoire of encoded proteins and their potential functions and cellular localizations.
A longstanding challenge is to understand how ribosomes parse mRNA open reading frames (ORFs). Significantly, GCN codons are over-represented in the initial codons of ORFs of prokaryote and eukaryote mRNAs. We describe a ribosome rRNA-protein surface that interacts with an mRNA GCN codon when next in line for the ribosome A-site. The interaction surface is comprised of the edges of two stacked rRNA bases: the Watson–Crick edge of 16S/18S rRNA C1054 and the adjacent Hoogsteen edge of A1196 (Escherichia coli 16S rRNA numbering). Also part of the interaction surface, the planar guanidinium group of a conserved Arginine (R146 of yeast ribosomal protein Rps3) is stacked adjacent to A1196. On its other side, the interaction surface is anchored to the ribosome A-site through base stacking of C1054 with the wobble anticodon base of the A-site tRNA. Using molecular dynamics simulations of a 495-residue subsystem of translocating ribosomes, we observed base pairing of C1054 to nucleotide G at position 1 of the next-in-line codon, consistent with previous cryo-EM observations, and hydrogen bonding of A1196 and R146 to C at position 2. Hydrogen bonding to both of these codon positions is significantly weakened when C at position 2 is changed to G, A or U. These sequence-sensitive mRNA-ribosome interactions at the C1054-A1196-R146 (CAR) surface potentially contribute to the GCN-mediated regulation of protein translation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.