Objective:The gut microbiota is closely associated with colorectal neoplasia. While most metagenomics studies utilized fecal samples, circulating microbial DNA in colorectal neoplasia patients remained unexplored. This study aimed to characterize microbial DNA in plasma samples and build a machine learning model for colorectal neoplasia early detection.
Design:We performed whole genome sequencing of plasma samples from 25 colorectal cancer (CRC) patients, 10 colorectal adenoma (CRA) patients and 22 healthy controls (HC). Microbial DNA was obtained by removing the host genome and relative abundance was measured by mapping reads into microbial genomes.Significant biomarker species were identified in the discovery cohort and built into a random forest model, which was tested in the validation cohort.
Results:In the discovery cohort, there were 127 significant species between CRC patients and HC. Based on the random forest model, 28 species were selected from the discovery cohort (AUC=0.944) and yielded an AUC of 1 in the validation cohort.Interestingly, relative abundance of most biomarker species in CRA patients were between CRC patients and HC with a trend towards CRC patients. Furthermore, pathway enrichment analysis also showed similar pattern where CRA patients had intermediate relative abundance of significant pathways compared to CRC patients and HC. Finally, species network analysis revealed that CRC and HC displayed distinct patterns of species association.
Conclusions:We demonstrated characteristic alteration of circulating bacterial DNA in colorectal neoplasia patients. The predictive model accurately distinguished CRC and CRA from HC, suggesting the utility of circulating bacterial biomarkers as a non-invasive tool for colorectal neoplasia screening and early diagnosis.