Unfrozen water content (UWC) is a key parameter affecting a variety of
soil physical-mechanical properties and processes in frozen soil
systems. However, traditional estimation models suffer limitations due
to oversimplified assumptions or limited applicable conditions. Given
that, there is a compelling need to explore alternative modeling
approaches that leverage machine learning (ML) algorithms, which have
shown increasing potential in engineering fields. To this end, this
study evaluated and compared six widely known ML algorithms (i.e., three
ensemble models: RF, LightGBM and XGBoost; and three non-ensemble
models: KNN, SVR and BPNN) for modeling UWC based on collected
experimental datasets. These algorithms were optimized and evaluated
using a framework combining Bayesian optimization and cross-validation
to ensure model stability and generalization. The results demonstrated
that the ensemble tree-based methods, particularly LightGBM and XGBoost,
achieved the highest predictive accuracy and superior overall
performance. On the other hand, the nonensemble methods exhibited poorer
generalization abilities. Interestingly, during 10-fold
cross-validation, consistent underperformance was observed for a
particular fold, possibly stemming from the challenges of the data
distribution in that fold after random shuffling. The present study
highlights the effectiveness of ensemble learning approaches, importance
of proper hyperparameter tuning and validation strategies, and intrinsic
modeling challenges arising from the difference between the freezing and
thawing phase change behaviors. This comprehensive ML model comparison
and robust training framework provide valuable guidance on selecting
suitable data-driven techniques for modeling frozen soil properties for
cold regions hydrogeology and engineering practices.