Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2 20 concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.[Supplemental material is available for this article.]High-throughput sequencing applications are revolutionizing genome-wide analysis (Mardis 2008;Mortazavi et al. 2008;Celniker et al. 2009;Morozova et al. 2009;Gerstein et al. 2010;Metzker 2010;Roy et al. 2010). RNA-seq offers single-nucleotide resolution, strand specificity, and short-range connectivity through pairedend sequencing. Because of these strengths, there has been great interest in using RNA-seq to distinguish isoforms, calculate expression levels for transcripts, and uncover low abundance RNAs (He et al. 2008;Mortazavi et al. 2008;Nagalakshmi et al. 2008;Sultan et al. 2008;Wang et al. 2008Wang et al. , 2010Passalacqua et al. 2009;Gerstein et al. 2010;Roy et al. 2010;Trapnell et al. 2010;Berezikov et al. 2011;Graveley et al. 2011).While there are clear advantages to RNA-seq, it is less clear how well the procedure performs, as several studies have reported conflicting RNA-seq accuracy results. RNA-seq-determined concentrations of six in vitro synthetic transcripts show good linearity (Mortazavi et al. 2008), and in a study using quantitative PCR as the benchmark, RNA-seq showed better performance for genes with high expression, while two-channel microarrays were more sensitive in identifying differential expression between genes with low ex...
The breadth of genetic and phenotypic variation among inbred strains is often underappreciated because assessments include only a limited number of strains. Evaluation of a larger collection of inbred strains provides not only a greater understanding of this variation but collectively mimics much of the variation observed in human populations. We used a high-throughput phenotyping protocol to measure females and males of 43 inbred strains for body composition (weight, fat, lean tissue mass, and bone mineral density), plasma triglycerides, high-density lipoprotein and total cholesterol, glucose, insulin, and leptin levels while mice consumed a high-fat, high-cholesterol diet. Mice were fed a chow diet until they were 6-8 wk old and then fed the high-fat diet for an additional 18 wk. As expected, broad phenotypic diversity was observed among these strains. Significant variation between the sexes was also observed for most traits measured. Additionally, the response to the high-fat diet differed considerably among many strains. By the testing of such a large set of inbred strains for many traits, multiple phenotypes can be considered simultaneously and thereby aid in the selection of certain inbred strains as models for complex human diseases. These data are publicly available in the web-accessible Mouse Phenome Database (http://www.jax.org/phenome), an effort established to promote systematic characterization of biochemical and behavioral phenotypes of commonly used and genetically diverse inbred mouse strains. Data generated by this effort builds on the value of inbred mouse strains as a powerful tool for biomedical research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.