BackgroundEnteric methane from cow burps, which results from microbial fermentation of high-fiber feed in the rumen, is a significant contributor to greenhouse gas emissions. A promising strategy to address this problem is microbiome-based precision feed, which involves identifying key microorganisms for methane production. While machine learning algorithms have shown success in associating human gut microbiome with various human diseases, there have been limited efforts to employ these algorithms to establish microbial biomarkers for methane emissions in ruminants.MethodsIn this study, we aim to identify potential methane biomarkers for methane emission from ruminants by employing regression algorithms commonly used in human microbiome studies, coupled with different feature selection methods. To achieve this, we analyzed the microbiome compositions and identified possible confounding metadata variables in two large public datasets of Holstein cows. Using both the microbiome features and identified metadata variables, we trained different regressors to predict methane emission. With the optimized models, permutation tests were used to determine feature importance to find informative microbial features.ResultsAmong the regression algorithms tested, random forest regression outperformed others and allowed the identification of several crucial microbial taxa for methane emission as members of the native rumen microbiome, including the genera Piromyces, Succinivibrionaceae UCG-002, and Acetobacter. Additionally, our results revealed that certain herd locations and feed composition markers, such as the lipid intake and neutral-detergent fiber intake, are also predictive features for methane emissions.ConclusionWe demonstrated that machine learning, particularly regression algorithms, can effectively predict cow methane emissions and identify relevant rumen microorganisms. Our findings offer valuable insights for the development of microbiome-based precision feed strategies aiming at reducing methane emissions.