Breast milk serves
as a vital source of essential nutrients for
infants. However, human milk contamination via the transfer of environmental
chemicals from maternal exposome is a significant concern for infant
health. The milk to plasma concentration (M/P) ratio is a critical
metric that quantifies the extent to which these chemicals transfer
from maternal plasma into breast milk, impacting infant exposure.
Machine learning-based predictive toxicology models can be valuable
in predicting chemicals with a high propensity to transfer into human
milk. To this end, we build such classification- and regression-based
models by employing multiple machine learning algorithms and leveraging
the largest curated data set, to date, of 375 chemicals with known
milk-to-plasma concentration (M/P) ratios. Our support vector machine
(SVM)-based classifier outperforms other models in terms of different
performance metrics, when evaluated on both (internal) test data and
an external test data set. Specifically, the SVM-based classifier
on (internal) test data achieved a classification accuracy of 77.33%,
a specificity of 84%, a sensitivity of 64%, and an F-score of 65.31%. When evaluated on an external test data set, our
SVM-based classifier is found to be generalizable with a sensitivity
of 77.78%. While we were able to build highly predictive classification
models, our best regression models for predicting the M/P ratio of
chemicals could achieve only moderate R
2 values on the (internal) test data. As noted in the earlier literature,
our study also highlights the challenges in developing accurate regression
models for predicting the M/P ratio of xenobiotic chemicals. Overall,
this study attests to the immense potential of predictive computational
toxicology models in characterizing the myriad of chemicals in the
human exposome.