Caffeine co‐crystal formation with other compounds is investigated in this study using a variety of machine learning (ML) methods. A total of 140 caffeine co‐crystal data points are used to train classification learners using MATLAB ML models. Kernel neural network, ensemble tree‐based, logistic regression, and support vector machine algorithms were among the ML models tested. The logistic regression algorithm produced the most accurate predictions of caffeine‐co‐crystal formation, with a validation accuracy of 97.1%. Experiments and molecular interaction studies between caffeine and other tea compounds (catechin and catechol) are used to validate ML predictions. As part of the evaluation, a random forest classifier was applied to select 1440 known molecular descriptors, among them 30 descriptors identified as responsible for caffeine co‐crystal formation were used for training and validation purpose. The reliability of the trained logistic regression algorithm means that it is suitable for use in predicting possible co‐crystals between caffeine and other compounds, thereby providing an understanding of caffeine co‐crystals formation without recourse to rigorous experimental tests.
Practical applications
Caffeine co‐crystals formation with other tea components is crucial to understand the generation of tea cream, which is undesirable to customers. Using a data‐driven approach (or machine learning) to identify the possible molecular combinations involved in co‐crystal formation can significantly reduce experimental test requirements. Machine learning data can help with investigations aimed at detailed characterization of ready‐to‐drink concentrated tea formulations. Furthermore, this study will assist researchers and policymakers in meeting the Sustainable Development Goals of “Industry, Innovation, and Infrastructure.”