2023
DOI: 10.1101/2023.03.11.532198
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data-Driven Mathematical Approach for Removing Rare Features in Zero-Inflated Datasets

Abstract: Sparse feature tables, in which many features are present in very few samples, are common in big biological data (e.g., metagenomics, transcriptomics). Ignoring the problem of zero-inflation can result in biased statistical estimates and decrease power in downstream analyses. Zeros are also a particular issue for compositional data analysis using log-ratios since the log of zero is undefined. Researchers typically deal with zero-inflated data by removing low frequency features, but the thresholds for removal d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 38 publications
0
3
0
Order By: Relevance
“…Sequences classified as mitochondria were removed with qiime taxa filter-table, leaving 741 SVs in the feature table. To deal with sparse features, SVs not present in at least five samples were removed using CurvCut [34], resulting in a final count of 333 SVs.…”
Section: S Rrna Amplicon Sequencingmentioning
confidence: 99%
See 1 more Smart Citation
“…Sequences classified as mitochondria were removed with qiime taxa filter-table, leaving 741 SVs in the feature table. To deal with sparse features, SVs not present in at least five samples were removed using CurvCut [34], resulting in a final count of 333 SVs.…”
Section: S Rrna Amplicon Sequencingmentioning
confidence: 99%
“…Salmon was used to generate a count table of coding sequences (genes) for each sample [47]. Gene features not present in at least four samples were removed using CurvCut [34].…”
Section: Shotgun Metagenome Genome Assemblies and Annotationmentioning
confidence: 99%
“…Removal of likely contaminants resulted in 741 SVs in the fecal samples, 606 SVs in the cecum samples, 832 SVs in ileal samples, and 1554 SVs in duodenum samples. These SVs were then zero-filtered to remove very low-abundance SVs using the CurvCut heuristic approach [86], which suggested feature removal of SVs present in 3 or fewer samples, resulting in a final count of 333 fecal SVs, 309 cecum SVs, 286 ileal SVs, and 281 duodenum SVs.…”
Section: S Rrna Gene Sequence Quality Control and Qiime2mentioning
confidence: 99%