In
this work, a novel strategy that combined molecular thermodynamic
and machine learning was proposed to accurately predict the solubility
of drugs in various solvents. The strategy was based on 16 molecular
descriptors representing drug–drug interactions and drug–solvent
interactions including physical parameters, pure perturbed-chain statistical
associating fluid theory (PC-SAFT) parameters of drugs and solvents,
and mixing rules. These molecular descriptors were inputted into five
machine learning algorithms [multiple linear regression (MLR), artificial
neural network (ANN), random forest (RF), extremely randomized trees
(ET), and support vector machine (SVM)] to train the predictive model.
A single-hidden-layer neural network was finally determined as the
predictive model for predicting the solubility of drugs in various
solvents. The drug solubility in the generalization evaluation set
has also been successfully predicted, which indicates the good prediction
performance of the model. Three directions for improving the model
were summarized as adding molecular descriptors of drug–solvent
interactions in the water system and drug–drug interactions
in the organic solvent system and expanding the dataset to adequately
obtain the features of multiple drugs. These findings show that the
proposed model has the capability of solubility prediction, which
is expected to provide important information for drug development
and drug solvent screening.