Predictive models are useful tools
for aqueous adsorption research;
existing models such as multilinear regression (MLR), however, can
only predict adsorption under specific equilibrium concentrations
or for certain adsorption isotherm models. Also, few studies have
discussed data processing beyond applying different modeling algorithms
to improve the prediction accuracy. In this research, we employed
a cosine similarity approach that focused on mining the available
data before developing models; this approach can mine the most relevant
data concerning the prediction target to build models and was found
to considerably improve the prediction accuracy. We then built a machine-learning
modeling process based on neural networks (NN), a group-selection
data-splitting strategy for grouped adsorption data for adsorbent–adsorbate
pairs under different equilibrium concentrations, and polyparameter
linear free energy relationships (pp-LFERs) for aqueous adsorption
of 165 organic compounds onto 50 biochars, 34 carbon nanotubes, 35
GACs, and 30 polymeric resins. The final NN-LFER models were successfully
applied to various equilibrium concentrations regardless of the adsorption
isotherm models and showed less prediction deviations than the published
models with the root-mean-square errors 0.23–0.31 versus 0.23–0.97
log unit, and the predictions were improved by adding two key descriptors
(BET surface area and pore volume) for the adsorbents. Finally, interpreting
the NN-LFER models based on the Shapley values suggested that not
considering equilibrium concentration and properties of the adsorbents
in the existing MLR models is a possible reason for their higher prediction
deviations.