2019
DOI: 10.1093/gigascience/giz107
|View full text |Cite
|
Sign up to set email alerts
|

A field guide for the compositional analysis of any-omics data

Abstract: Background Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
202
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 244 publications
(216 citation statements)
references
References 79 publications
0
202
0
Order By: Relevance
“…For example, transcription of asRNA may constitute a significant percentage of the data and may be associated with only a few genes. As sequencing data are inherently compositional, there will be an overrepresentation of spurious negative correlations with the remaining gene population, which cannot be amended using traditional quantitative data analysis (41). This is true regardless of whether the highly expressed genes are systematically related to the experiment or not.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, transcription of asRNA may constitute a significant percentage of the data and may be associated with only a few genes. As sequencing data are inherently compositional, there will be an overrepresentation of spurious negative correlations with the remaining gene population, which cannot be amended using traditional quantitative data analysis (41). This is true regardless of whether the highly expressed genes are systematically related to the experiment or not.…”
Section: Resultsmentioning
confidence: 99%
“…The naive solution will be to quantify mRNA only, ensuring that there are sufficient data for proper mRNA quantification. However, compositional data analysis methods exist, for which these issues can be amended (41,42). As a minimum, we encourage paying attention to highly expressed genes with high fractions of asRNA, e.g., Ͼ90%, and either naively discarding them from downstream analysis or performing a thorough investigation to verify their credibility using existing tools for detecting spurious open reading frames, such as AntiFam (35).…”
Section: Resultsmentioning
confidence: 99%
“…These standard strategies are widely employed, but have recently been questioned due to the compositional nature of whole metagenomic sequencing data [29,30]. To address this issue, several Compositional Data Analysis (CoDA) approaches to analyze sequencing datasets have been recently proposed [31,32].…”
Section: Such Unique Features Make Standard Parametric Tests and Mostmentioning
confidence: 99%
“…Beyond requiring several pre-processing steps, the summarized data arise from a sampling process that introduces between-sample biases in which the total number of counts, called the sequencing depth, depends on technical factors, not on the amount of input material [12,42,37]. Analysts often attempt to remove this bias with an effective library size normalization, or with normalization to a spike-in or house-keeping transcript [27] (though all normalizations have limitations [36]). Instead, one could build normalization-free gene co-expression networks using proportionality [26].…”
Section: Introductionmentioning
confidence: 99%