Abstract. Data-driven flow forecasting models, such as Artificial Neural Networks (ANNs), are increasingly used for operational flood warning systems. However, flow distributions are highly imbalanced, resulting in poor prediction accuracy on high flows, both in terms of amplitude and timing error. Resampling and ensemble techniques have shown to improve model performance of imbalanced datasets such as streamflow. In this research, we systematically evaluate and compare three resampling: random undersampling (RUS), random oversampling (ROS), and SMOTER; and four ensemble techniques: randomised weights and biases, bagging, adaptive boosting (AdaBoost), least squares boosting (LSBoost); on their ability to improve high flow prediction accuracy using ANNs. The methods are implemented both independently and in combined, hybrid techniques. While some of these combinations have been explored in the broader machine learning literature, this research contains many of the first instances of these algorithms to address the imbalance problem inherent in flood and high flow forecasting models. Specifically, the implementation of ROS, and new approaches for SMOTER, LSBOOST, and SMOTER-AdaBoost are presented in this research. Data from two Canadian watersheds (the Bow River in Alberta, and the Don River in Ontario), representing distinct hydrological systems, are used as the basis for the comparison of the methods. The models are evaluated on overall performance and on high flows. The results of this research indicate that resampling produces marginal improvements to high flow prediction accuracy, whereas ensemble methods produce more substantial improvements, with or without a resampling method. Compared to simple ANN flow forecast models, the use of ensemble methods is recommended to reduce the amplitude and timing error in highly imbalanced flow datasets.