Accurate quantification of transcriptome from RNA-Seq data by effective length normalization

Lee, Mi Kyung; Seo, Chae Hwa; Lim, Byungho; Yang, Jin Ok; Oh, Jeongsu; Kim, Minjin; Lee, Sooncheol; Lee, Byung-Wook; Kang, Changwon; Lee, Sanghyuk

doi:10.1093/nar/gkq1015

Cited by 106 publications

(87 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Available abundance estimation methods include direct computation (9,10) and model-based approaches. Many model-based studies (1,(11)(12)(13)(14) have used maximum-likelihood approaches to estimate isoform abundance.…”

mentioning

confidence: 99%

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation

Jiang

Brown

et al. 2011

Proc. Natl. Acad. Sci. U.S.A.

129

142

View full text Add to dashboard Cite

Since the inception of next-generation mRNA sequencing (RNASeq) technology, various attempts have been made to utilize RNA-Seq data in assembling full-length mRNA isoforms de novo and estimating abundance of isoforms. However, for genes with more than a few exons, the problem tends to be challenging and often involves identifiability issues in statistical modeling. We have developed a statistical method called "sparse linear modeling of RNA-Seq data for isoform discovery and abundance estimation" (SLIDE) that takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isoform assembly algorithms (e.g., Cufflinks), SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data such as RACE, CAGE, and EST into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The SLIDE software package is available at https://sites.google. com/site/jingyijli/SLIDE.zip.mRNA isoform discovery | single-end vs. paired-end sequencing | fragment length distribution | GC contents | penalized estimation T he recently developed next-generation mRNA sequencing (RNA-Seq) assay, with deep coverage and base level resolution, has provided a view of eukaryotic transcriptomes of unprecedented detail and clarity. Unlike microarrays, RNA-Seq data have novel splice junction information in addition to gene expression, thus facilitating whole-transcriptome assembly and mRNA isoform quantification. RNA-Seq data includes both single-end and paired-end reads, where a single-end read is a sequenced end of a cDNA fragment from an mRNA transcript, and a pairedend read is a mate pair corresponding to both ends of a cDNA fragment.In the mRNA isoform discovery field, one of the most widely used software packages is Cufflinks (1). It builds a set of genes and exons solely from RNA-Seq data first, and subsequently uses a deterministic approach to find a minimal set of isoforms that can explain all the cDNA fragments indicated by paired-end reads. Cufflinks mainly uses qualitative exon expression and junction information in its isoform discovery, lacking a quantitative consideration of RNA-Seq data. Although Cufflinks gives very useful results, we no...

show abstract

mentioning

confidence: 99%

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation

Jiang

Brown

et al. 2011

Proc. Natl. Acad. Sci. U.S.A.

129

142

View full text Add to dashboard Cite

show abstract

“…Population sizes for the common ancestors (> 8 kya and ~2 kya) of RJF and VC were obtained from our MSMC analysis. Since MSMC has a low power to estimate population size at relatively recent times, the effective population sizes for present day RJF and VC populations were taken from elsewhere [68] (1.6 × 10 5 and 4 × 10 5 , respectively). Generation time (g) and mutation rate per year (u) for chicken used here is 1 year and 1.91 × 10 −9 , respectively [69].…”

Section: Demographic History and Coalescent Simulationsmentioning

confidence: 99%

“…The expression level (FPKM) for each gene in each tissue was retrieved and transformed according to log 2 (FPKM + 1) [68]. The difference of expression level for each gene between VC and RJF was calculated using log 2 ((FPKM + 1) VC /(FPKM + 1) RJF ).…”

Section: Comparison Of Gene Expressionmentioning

confidence: 99%

Positive selection rather than relaxation of functional constraint drives the evolution of vision during chicken domestication

Wang

Zhang

et al. 2016

Cell Res

View full text Add to dashboard Cite

As noted by Darwin, chickens have the greatest phenotypic diversity of all birds, but an interesting evolutionary difference between domestic chickens and their wild ancestor, the Red Junglefowl, is their comparatively weaker vision. Existing theories suggest that diminished visual prowess among domestic chickens reflect changes driven by the relaxation of functional constraints on vision, but the evidence identifying the underlying genetic mechanisms responsible for this change has not been definitively characterized. Here, a genome-wide analysis of the domestic chicken and Red Junglefowl genomes showed significant enrichment for positively selected genes involved in the development of vision. There were significant differences between domestic chickens and their wild ancestors regarding the level of mRNA expression for these genes in the retina. Numerous additional genes involved in the development of vision also showed significant differences in mRNA expression between domestic chickens and their wild ancestors, particularly for genes associated with phototransduction and photoreceptor development, such as RHO (rhodopsin), GUCA1A, PDE6B and NR2E3. Finally, we characterized the potential role of the VIT gene in vision, which experienced positive selection and downregulated expression in the retina of the village chicken. Overall, our results suggest that positive selection, rather than relaxation of purifying selection, contributed to the evolution of vision in domestic chickens. The progenitors of domestic chickens harboring weaker vision may have showed a reduced fear response and vigilance, making them easier to be unconsciously selected and/or domesticated.

show abstract

“…Gene expression levels were estimated using an in-house developed application, which calculates fragments per kilobase of expressed exons per million mapped reads (FPKM values) in a manner similar to NEUMA. 32 In our approach, to calculate an effective length of genes, instead of using simulated data, we used a pooled set of aligned RNA-Seq reads for assessing genome mapability. Further analyses of gene expression results and generation of plots were performed in R (version 3.1.2), with the aid of "plyr" and "ggpot2" packages.…”

Section: Transcriptome Analysismentioning

confidence: 99%

Targeting Human Long Noncoding Transcripts by Endoribonuclease-Prepared siRNAs

et al. 2015

View full text Add to dashboard Cite

Broad sequencing enterprises such as the FANTOM or ENCODE projects have substantially extended our knowledge of the human transcriptome. They have revealed that a large portion of genomic DNA is actively transcribed and have identified a plethora of novel transcripts. Many newly identified transcripts belong to the class of long noncoding RNAs (lncRNAs), which range from a few hundred bases to multiple kilobases in length and harbor no protein-coding potential. Although the biological activity of some lncRNAs is understood, the functions of most lncRNAs remain elusive. Tools that allow rapid and cost-effective access to functional data of lncRNAs are therefore essential. Here, we describe the construction and validation of an endoribonuclease-prepared siRNA (esiRNA) library designed to target 1779 individual human lncRNAs by RNA interference. We present a compendium of lncRNA expression data for 11 human cancer cell lines. Furthermore, we show that the resource is suitable for combined knockdown and localization analysis. We discuss challenges in sequence annotation of lncRNAs with respect to their often low and cell type-specific expression and specify esiRNAs that are suitable for targeting lncRNAs in commonly used human cell lines.

show abstract

Accurate quantification of transcriptome from RNA-Seq data by effective length normalization

Cited by 106 publications

References 15 publications

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation

Positive selection rather than relaxation of functional constraint drives the evolution of vision during chicken domestication

Targeting Human Long Noncoding Transcripts by Endoribonuclease-Prepared siRNAs

Contact Info

Product

Resources

About