Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.
The gut microbiota is commonly referred to as a hidden organ due to its pivotal effects on host physiology, metabolism, nutrition and immunity. The gut microbes may be shaped by environmental and host genetic factors, and previous studies have focused on the roles of protein-coding genes. Here we show a link between long non-coding RNA (lncRNA) expression and gut microbes. By repurposing exon microarrays and comparing the lncRNA expression profiles between germ-free, conventional and different gnotobiotic mice, we revealed subgroups of lncRNAs that were specifically enriched in each condition. A nearest shrunken centroid methodology was applied to obtain lncRNA-based signatures to identify mice in different conditions. The lncRNA-based prediction model successfully identified different gnotobiotic mice from conventional and germ-free mice, and also discriminated mice harboring transplanted microbes from fecal samples of mice or zebra fishes. To achieve optimal prediction accuracy, fewer lncRNAs were required in the prediction model than protein-coding genes. Taken together, our study demonstrated the effecacy of lncRNA expression profiles in discriminating the types of microbes in the gut. These results also provide a resource of gut microbe-associated lncRNAs for the development of lncRNA biomarkers and the identification of functional lncRNAs in host-microbes interactions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.