Pharmaceutical
cocrystals are crystalline materials composed of
at least two molecules, i.e., an active pharmaceutical
ingredient (API) and a coformer, assembled by noncovalent forces.
Cocrystallization is successfully applied to improve the physicochemical
properties of APIs, such as solubility, dissolution profile, pharmacokinetics,
and stability. However, choosing the ideal coformer is a challenging
task in terms of time, efforts, and laboratory resources. Several
computational tools and machine learning (ML) models have been proposed
to mitigate this problem. However, the challenge of achieving a robust
and generalizable predictive method is still open. In this study,
we propose a new approach to quickly predict the formation of cocrystals,
employing partial least squares-discriminant analysis, random forest,
and neural networks. The models were based on the data sets of 13
structurally different APIs with both positive and negative cocrystallization
outcomes. At the same time, the features were specially selected from
a variety of molecular descriptors to explain the phenomenon of the
cocrystallization. All of the proposed ML models showed a cross-validation
accuracy higher than 83%. Furthermore, this approach was successfully
applied to drive the cocrystallization experimental tests of 2-phenylpropionic
acid, showcasing the high potential of the ML models in practice.