1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity). Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting communitylevel patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs.2. We tested these theoretical predictions via simulations and a real-world dataset.3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
Contamination is a ubiquitous problem in microbiome research and can skew results, especially when small amounts of target DNA are available. Nevertheless, no clear solution has emerged for removing microbial contamination. To address this problem, we developed the R package microDecon (https://github.com/donaldtmcknight/microDecon), which uses the proportions of contaminant operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) in blank samples to systematically identify and remove contaminant reads from metabarcoding data sets. We rigorously tested microDecon using a series of computer simulations and a sequencing experiment. We also compared it to the common practice of simply removing all contaminant OTUs/ASVs and other methods for removing contamination. Both the computer simulations and our sequencing data confirmed the utility of microDecon. In our largest simulation (100,000 samples), using microDecon improved the results in 98.1% of samples. Additionally, in the sequencing data and in simulations involving groups, it enabled accurate clustering of groups as well as the detection of previously obscured patterns. It also produced more accurate results than the existing methods for identifying and removing contamination. These results demonstrate that microDecon effectively removes contamination across a broad range of situations. It should, therefore, be widely applicable to microbiome studies, as well as to metabarcoding studies in general.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.