Background
The effect of microbes on their human host is often mediated through changes in metabolite concentrations. As such, multiple tools have been proposed to predict metabolite concentrations from microbial taxa frequencies. Such tools typically fail to capture the dependence of the microbiome-metabolite relation on the environment.
Results
We propose to treat the microbiome-metabolome relation as the equilibrium of a complex interaction and to relate the host condition to a latent representation of the interaction between the log concentration of the metabolome and the log frequencies of the microbiome. We develop LOCATE (Latent variables Of miCrobiome And meTabolites rElations), a machine learning tool to predict the metabolite concentration from the microbiome composition and produce a latent representation of the interaction. This representation is then used to predict the host condition.
LOCATE’s accuracy in predicting the metabolome is higher than all current predictors. The metabolite concentration prediction accuracy significantly decreases cross datasets, and cross conditions, especially in 16S data.
LOCATE’s latent representation predicts the host condition better than either the microbiome or the metabolome. This representation is strongly correlated with host demographics. A significant improvement in accuracy (0.793 vs. 0.724 average accuracy) is obtained even with a small number of metabolite samples ($$\sim 50$$
∼
50
).
Conclusion
These results suggest that a latent representation of the microbiome-metabolome interaction leads to a better association with the host condition than any of the two separated or the simple combination of the two.