Widely adopted models for estimating channel geometry attributes rely on
simplistic power-law (hydraulic geometry) equations. This study presents
a new generation of channel geometry models based on a hybrid approach
combining traditional statistical methods (Multi-Linear Regression
(MLR)) and advanced tree-based Machine Learning (ML) algorithms (Random
Forest Regression (RFR) and eXtreme Gradient Boosting Regression
(XGBR)), utilizing novel datasets. To achieve this, a new preprocessing
method was applied to refine an extensive observational dataset, namely
the HYDRoacoustic dataset supporting Surface Water Oceanographic
Topography (HYDRoSWOT). This process improved data quality and
identified observations representing bankfull and mean-flow conditions.
A compiled dataset, combining the preprocessed dataset with datasets
containing additional catchment attributes like the National Hydrography
Dataset Plus (NHDplusv2.1), was then used to train a suite of models to
predict channel width and depth under bankfull and mean-flow conditions.
The analysis shows that tree-based ML algorithms outperform traditional
statistical methods in accuracy and handling the data but face
limitations in prediction capabilities for streams with characteristics
outside the training range. Consequently, a hybrid method was selected,
combining XGBR for streams within the dataset range and MLR for those
outside it. Two tiers of models were developed for each attribute using
discharges derived from distinct sources (HYDRoSWOT and NHDPlusV2.1,
respectively), where the second tier of models offers applicability
across approximately 2.6 million streams within NHDplusv2.1.
Comprehensive independent evaluations are conducted to assess the
capability of the developed models in providing stream/reach-averaged
(rather than at-a-station) predictions for locations outside the
training and testing datasets.