Accurate prediction of methane production in anaerobic
digestion
with various pretreatment strategies is of the utmost importance for
efficient sludge treatment and resource recovery. Traditional machine
learning (ML) algorithms have shown limited prediction accuracy due
to challenges in optimizing complex parameters and the scarcity of
data. This work proposed a novel integrated system that employed an
ensemble semisupervised learning (SSL)-automated ML (AutoML) model
with limited variable inputs to reveal the effects of different pretreatments
on methane production during sludge digestion with explainable analysis.
Considering the direct correlations of the pretreatment type and digestion
substrates, the pretreatment type is considered as a hidden variable.
Results demonstrated that the AutoML model outperformed the conventional
ML models (i.e., support vector regression (SVR), extreme gradient
boosting (XGB), etc.), as evidenced by its higher R
2 value. Moreover, the integration of SSL further enhanced
the prediction accuracy by effectively leveraging unlabeled data,
leading to a reduction in the mean squared error from 11.3 to 9.7.
Explainable analysis results revealed the significance of different
variables and the utmost importance of operating time, followed by
proteins, carbohydrates, chemical oxygen demand, and volatile fatty
acids. Furthermore, principal component and correlation analysis unveiled
the interconnected relationships among substrate concentration, microbial
communities, and metabolic functions for methane production and found
that the increasing substrate concentration promoted the enrichment
of functional microbial and metabolic functions. These insights shed
light on the advantages of SSL-AutoML in predicting methane production
in anaerobic digestion systems and elucidate the dependence relationships
with key variables, offering valuable guidance for effective sludge
pretreatment with enhanced resource recovery capabilities.