The
solubility parameter is widely used to select suitable solvents
for polymers in the polymer-processing industry. In this study, we
established a Hildebrand solubility parameter prediction model using
ensemble-learning methods. The database used in the study is from
the 2019 edition of the DIPPR 801 database, which includes solubility
parameters for 1889 chemicals after removing invalid entries and outliers.
Three machine-learning techniques including random forest, gradient
boosting, and extreme gradient (XG) boosting were implemented to develop
quantitative structure–property relationship analysis (QSPR)
models. Subsequently, the ensemble method was applied to achieve higher
accuracy. The coefficient of determination (R
2) and root-mean-square error (RMSE) were calculated to validate
that ensemble-learning models achieved satisfactory predictive capabilities
with the overall R
2 being 0.9793 and RMSE
being 785.3313. Compared with determining the solubility parameter
experimentally, the ensemble-learning models can perform a large-scale
test within a few seconds. The models can be used to predict promising
solvents for newly developed polymers at much lower time costs.