Predicting the permeate flux is critical for evaluating
and optimizing
the performance of the forward osmosis (FO) process. However, the
solution diffusion models have poor applicability in accessing the
FO process. Recently, the data-driven eXtreme Gradient Boosting (XGBoost)
algorithm has been proven to be effective in processing structure
data in engineering problems and has not been utilized to assess the
FO process. Herein, a combination of the XGBoost model with a genetic
algorithm (GA) was first proposed to predict the permeate flux, highlighting
its superiority in the FO process through comparison of the support
vector regression (SVR) model, the artificial neural network (ANN),
and the multiple linear regression (MLR). Moreover, the performance
of these models was optimized by tuning hyperparameters with a genetic
algorithm (GA) and compared via Taylor Diagram. Among these machine
learning (ML) models, the GA-based XGBoost model is superior to the
other three models in terms of mean square error (MSE, 2.7326) and
coefficient of determination (R
2, 0.9721)
on the test data, and its prediction power was compared to that of
the solution diffusion (SD) model in the literature. Finally, further
insight into the feature importance that affects the permeate flux
in the FO process was examined by utilizing the SHapley Additive exPlanations
(SHAP) to estimate the contribution value of various variables. The
results demonstrated that the XGBoost model could predict the permeate
flux in the FO system with high accuracy and good generalization ability
for the given data set and even on the unseen data. Furthermore, the
findings of the SHAP method show that the osmotic pressure difference,
the osmotic pressure difference of draw solution and FS solution,
the crossflow velocity of the feed solution and draw solution, and
the water permeability coefficient have a significant impact on water
flux.