Recently, the Ramprasad group reported a quantitative
structure–property
relationship (QSPR) model for predicting the
E
gap
values of 4209 polymers, which yielded a test set
R
2
score of 0.90 and a test set root-mean-square
error (RMSE) score of 0.44 at a train/test split ratio of 80/20. In
this paper, we present a new QSPR model named
LGB-Stack
, which performs a two-level stacked generalization using the light
gradient boosting machine. At level 1, multiple weak models are trained,
and at level 2, they are combined into a strong final model. Four
molecular fingerprints were generated from the simplified molecular
input line entry system notations of the polymers. They were trimmed
using recursive feature elimination and used as the initial input
features for training the weak models. The output predictions of the
weak models were used as the new input features for training the final
model, which completes the
LGB-Stack
model training
process. Our results show that the best test set
R
2
and the RMSE scores of
LGB-Stack
at
the train/test split ratio of 80/20 were 0.92 and 0.41, respectively.
The accuracy scores further improved to 0.94 and 0.34, respectively,
when the train/test split ratio of 95/5 was used.