Background
Microbial communities in the human body, also known as human microbiota, impact human health, such as colorectal cancer (CRC). However, the different roles that microbial communities play in healthy and disease hosts remain largely unknown. The microbial communities are typically recorded through the taxa counts of operational taxonomic units (OTUs). The sparsity and high correlations among OTUs pose major challenges for understanding the microbiota-disease relation. Furthermore, the taxa data are structured in the sense that OTUs are related evolutionarily by a hierarchical structure.
Results
In this study, we borrow the idea of super-variant from statistical genetics, and propose a new concept called super-taxon to exploit hierarchical structure of taxa for microbiome studies, which is essentially a combination of taxonomic units. Specifically, we model a genus which consists of a set of OTUs at low hierarchy and is designed to reflect both marginal and joint effects of OTUs associated with the risk of CRC to address these issues. We first demonstrate the power of super-taxon in detecting highly correlated OTUs. Then, we identify CRC-associated OTUs in two publicly available datasets via a discovery-validation procedure. Specifically, four species of two genera are found to be associated with CRC: Parvimonas micra, Parvimonas sp., Peptostreptococcus stomatis, and Peptostreptococcus anaerobius. More importantly, for the first time, we report the joint effect of Parvimonas micra and Parvimonas sp. (p = 0.0084) as well as that of Peptostrepto-coccus stomatis and Peptostreptococcus anaerobius (p = 8.21e-06) on CRC. The proposed approach provides a novel and useful tool for identifying disease-related microbes by taking the hierarchical structure of taxa into account and further sheds new lights on their potential joint effects as a community in disease development.
Conclusions
Our work shows that proposed approaches are effective to study the microbiota-disease relation taking into account for the sparsity, hierarchical and correlated structure among microbes.