Prediction of drug solubility is
a crucial problem in pharmaceutical
industries for both drug delivery and discovery purposes. Several
theoretical approaches have been proposed to predict drug solubility
in mixed solvent systems when the solubility values in pure solvents
are known. Quantitative structure property relationship (QSPR) approaches
are gaining attention to predict various physical properties due to
their robustness and computational tractability. In this work, a machine
learning based QSPR approach is proposed to predict drug solubility
in binary solvent systems using structural features, such as molar
refractivity, McGowan volume, topological surface area, and so forth.
A genetic algorithm based feature selection procedure is used to check
the dependency between the selected features and to obtain the final
set of significant features. Initially, solubility is assumed to behave
linearly with respect to the structural features and model parameters
are estimated using ordinary least-squares and a weight-based optimization
approach. Later, solubility is assumed to be piecewise linear with
respect to structural features and multiple model (MM) parameters
are identified using a machine learning approach, which is a prediction
error based clustering approach. The efficacy of proposed approaches
is demonstrated on drug solubility data collected from literature.
To compare the efficiency of the proposed MM approach, a neural network
based nonlinear model with different configurations using a Levenberg–Marquardt
training algorithm has been tested. A novel testing strategy is also
proposed to identify a suitable model for a test sample when model
parameters are obtained using a prediction error based clustering
approach.