Abstract. Estimation of flood quantiles at ungauged basins is often achieved through regression based methods. In situations where flood retention is important, e.g. floodplain management and reservoir design, flood quantile estimates are often needed at multiple durations. This poses a problem for regression-based models as the form of the functional relationship between catchment descriptors and the response may not be constant across different durations. A particular type of regression model that is well-suited to this situation is a generalized additive model (GAM), which allows for flexible, semi-parametric modeling and visualization of the relationship between predictors and the response. However, in practice, selecting predictors for such a flexible model can be challenging, particularly given the characteristics of available catchment descriptor datasets. We employ a machine learning-based variable pre-selection tool which, when combined with domain knowledge, enhances the practicality of constructing GAMs. In this study, we develop a GAM for index (median) flood estimation with the primary objective of investigating duration-specific differences in how catchment descriptors influence the median flood. As the accuracy of this explainable approach is dependent on the fitted GAM being adequate, the secondary objective of our study is prediction of the median flood at ungauged locations and multiple durations, where predictive performance and reliability at ungauged locations are used as proxies for adequacy of the GAM. Predictive performance of the GAM is compared to two benchmark models: the existing log-linear model for median flood estimation in Norway and a fully data-driven machine learning model (an extreme gradient boosting tree ensemble, XGBoost). We find that the predictive accuracy and reliability of the GAM matched or exceeded that of the benchmark models at both durations studied. Within the predictor set selected for this study, we observe duration-specific differences in the relationship between the median flood and the two catchment descriptors effective lake percentage and catchment shape. Ignoring these differences results in a statistically significant decline in predictive performance. This suggests that models developed and estimated for prediction of the index flood at one duration may have reduced performance when applied directly to situations outside of that specific duration.