Identifying differentially expressed transcripts from RNA-seq data with biological variation

Glaus, Peter; Honkela, Antti; Rattray, Magnus

doi:10.1093/bioinformatics/bts260

Cited by 188 publications

(226 citation statements)

References 35 publications

Supporting

Mentioning

223

Contrasting

Unclassified

Order By: Relevance

“…idx <-which(res$adj_pvalue < 0.05) [1] A typical analysis of differential transcript usage would involve asking first: "which genes contain any evidence of DTU? ", and secondly, "which transcripts in the genes that contain some evidence may be participating in the DTU?"…”

Section: ## [1] Truementioning

confidence: 99%

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification

2018

View full text Add to dashboard Cite

Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data. Keywords

show abstract

Section: ## [1] Truementioning

confidence: 99%

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification

2018

View full text Add to dashboard Cite

show abstract

“…Several software packages have been developed for performing such “simple” counting (e.g., featureCounts 1 and HTSeq-count 2 ). More recently, the field has seen a surge in methods aimed at quantifying the abundances of individual transcripts (e.g., Cufflinks 3 , RSEM 4 , BitSeq 5 , kallisto 6 and Salmon 7 ). These methods provide higher resolution than simple counting, and by circumventing the computationally costly read alignment step, some are considerably faster.…”

Section: Introductionmentioning

confidence: 99%

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

2015

View full text Add to dashboard Cite

High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

show abstract

“…Both the number of mixture components (transcripts) and observations (reads) cover a wide range. The method described in Glaus et al (2012) was implemented in order to compute the likelihood of the n reads to the K transcripts, as well as to obtain an MCMC sample from the posterior distribution. Finally, the VB methods were applied.…”

Section: Rna-seq Datasetsmentioning

confidence: 99%

“…Li et al (2010) applied a maximum likelihood approach, using the expectation-maximization (EM) algorithm (Dempster et al, 1977). The Bayesian approach was followed by Katz et al (2010), Turro et al (2011) and Glaus et al (2012), via Markov chain Monte Carlo (MCMC) sampling. However, the high dimensionality of RNA-seq datasets imposes certain new inferential difficulties, making convergence of MCMC methods a time consuming task.…”

Section: Introductionmentioning

confidence: 99%

“…In particular, we show that it is better to approximate the actual posterior distribution, rather than the joint posterior of model parameters and latent variables. The proposed methodology builds upon the BitSeq model (Glaus et al, 2012) and exploits the solution of standard VB. An optimization is performed over a class of distributions that share the same mean as the VB solution, but the variance is different.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improved variational Bayes inference for transcript expression estimation

Papastamoulis

Hensman²,

Glaus³

et al. 2014

Statistical Applications in Genetics and Molecular Biology

Self Cite

View full text Add to dashboard Cite

RNA-seq studies allow for the quantification of transcript expression by aligning millions of short reads to a reference genome. However, transcripts share much of their sequence, so that many reads map to more than one place and their origin remains uncertain. This problem can be dealt using mixtures of distributions and transcript expression reduces to estimating the weights of the mixture. In this paper, variational Bayesian (VB) techniques are used in order to approximate the posterior distribution of transcript expression. VB has previously been shown to be more computationally efficient for this problem than Markov chain Monte Carlo. VB methodology can precisely estimate the posterior means, but leads to variance underestimation. For this reason, a novel approach is introduced which integrates the latent allocation variables out of the VB approximation. It is shown that this modification leads to a better marginal likelihood bound and improved estimate of the posterior variance. A set of simulation studies and application to real RNA-seq datasets highlight the improved performance of the proposed method.

show abstract

Identifying differentially expressed transcripts from RNA-seq data with biological variation

Cited by 188 publications

References 35 publications

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Improved variational Bayes inference for transcript expression estimation

Contact Info

Product

Resources

About