2014
DOI: 10.1186/2049-2618-2-15
|View full text |Cite
|
Sign up to set email alerts
|

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis

Abstract: BackgroundExperimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experiment… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
969
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,015 publications
(973 citation statements)
references
References 45 publications
4
969
0
Order By: Relevance
“…Of course, many other methods exist, including but not limited to: Cuffdiff (Trapnell et al (2010)), Cuffdiff2 (Trapnell et al (2013)), NBPSeq (Di, Schafer, Cumbie, and Chang (2011)), TSPM (Auer and Doerge (2011)), baySeq (Hardcastle and Kelly (2010)), EBSeq (Leng et al (2013)), NOISeq (Tarazona, GarcĂ­a-Alcalde, Dopazo, Ferrer, and Conesa (2011)), SAMseq (J. Li and Tibshirani (2013)), ShrinkSeq (Van De Wiel et al (2012)), DEGSeq (Wang, Feng, Wang, Wang, and Zhang (2010)), BBSeq (Y.-H. Zhou, Xia, and Wright (2011)), FDM (Singh et al (2011)), RSEM (B. Li and Dewey (2011)), Myrna (Langmead, Hansen, and Leek (2010)), PANDORA (Moulos and Hatzis (2014)), ALDEx2 (Fernandes et al (2014)), PoissonSeq (J. Li, Witten, Johnstone, and Tibshirani (2011)), and GPSeq (Srivastava and Chen (2010)). We provide code that can be easily adapted to any method that runs in R and applied to the publicly available data sets we used, as well as others.…”
Section: Cc-by-nd 40 International License Peer-reviewed) Is the Autmentioning
confidence: 99%
“…Of course, many other methods exist, including but not limited to: Cuffdiff (Trapnell et al (2010)), Cuffdiff2 (Trapnell et al (2013)), NBPSeq (Di, Schafer, Cumbie, and Chang (2011)), TSPM (Auer and Doerge (2011)), baySeq (Hardcastle and Kelly (2010)), EBSeq (Leng et al (2013)), NOISeq (Tarazona, GarcĂ­a-Alcalde, Dopazo, Ferrer, and Conesa (2011)), SAMseq (J. Li and Tibshirani (2013)), ShrinkSeq (Van De Wiel et al (2012)), DEGSeq (Wang, Feng, Wang, Wang, and Zhang (2010)), BBSeq (Y.-H. Zhou, Xia, and Wright (2011)), FDM (Singh et al (2011)), RSEM (B. Li and Dewey (2011)), Myrna (Langmead, Hansen, and Leek (2010)), PANDORA (Moulos and Hatzis (2014)), ALDEx2 (Fernandes et al (2014)), PoissonSeq (J. Li, Witten, Johnstone, and Tibshirani (2011)), and GPSeq (Srivastava and Chen (2010)). We provide code that can be easily adapted to any method that runs in R and applied to the publicly available data sets we used, as well as others.…”
Section: Cc-by-nd 40 International License Peer-reviewed) Is the Autmentioning
confidence: 99%
“…Log-ratio analysis of compositional data does not allow null proportions as an argument of a logarithm, thus requiring special treatment of such data (MartĂ­n-FernĂĄndez, Hron, Templ, Filzmoser, and Palarea-Albaladejo 2015a;Gloor, Macklaim, Pawlowsky-Glahn, and Egozcue 2017). The present contribution is not aimed at discussing procedures and methods for treating these zero count data, although Bayesian estimation methods show some promise (Fernandes et al 2014). Therefore, the number of genera used in the examples has been reduced to 12 by removing all genera with a total count across all samples of less than 5000, or that have zero counts in more than 100 samples.…”
Section: Example Using An 16s Rrna Gene Profiling Casementioning
confidence: 99%
“…Here we will describe R packages specifically intended for metagenomic analysis: basic methods commonly used for a comparison of two or more groups [on the example of their implementation in ALDEx2 package (Fernandes et al, 2014)] and advanced approaches based on generalized linear models allowing both continuous and discrete factors [metagenomeSeq (Paulson et al, 2013), edgeR (McCarthy et al, 2012), DESeq2 (Love et al, 2014), MaAsLin, shotgunFunctionalizeR (Kristiansson et al, 2009)]. Finally, the methods for vector-wise rather than component-wise comparison will be introduced [HMP (La Rosa et al, 2012), vegan (Oksanen et al, 2012), micropower (Kelly et al, 2015)].…”
Section: Total Read Count Varies Between the Samplesmentioning
confidence: 99%
“…., α n ) is a feature vector (with an additional pseudocount of 0.5 added to each component) and B(α) is the multivariate beta function. The greater the taxon abundance and the less the whole number of reads for the sample, the greater the variance (Fernandes et al, 2014). Substituting the original feature vector with several random vectors generated from the corresponding Dirichlet distribution leads to a more correct estimation of variance and thus of significance of differences.…”
Section: Component-wise Analysismentioning
confidence: 99%
See 1 more Smart Citation