Abstract. The improvement of flood forecast ability of models is a key issue in hydrology, particularly in Mediterranean catchments that are subjected to strong convective events. This contribution compared models of different complexities, lumped GR4H, continuous SMASH and process-oriented MARINE. The objective was to understand how they simulate catchment's hydrological behavior, the differences in terms of their simulated discharge, the soil moisture, and how these can help to improve the relevance of the models. The study was applied on two Mediterranean catchments in the South of France. The methodology involved global sensitivity analysis, investigations of the response surface, calibration and validation, signature comparison at event scale, and comparison of soil moisture simulated with respect to the outputs of the surface model, SIM. The results revealed contrasted and catchment specific parameter sensitivity to the same efficiency measure and equifinality issues are highlighted via response surface plots. Higher sensitivity is found for all models to transfer parameters on the Gardon and for production parameters on the Ardeche. The exchange parameter controlling a non-conservative flow component of GR4H is found to be sensitive. All models had good calibration efficiencies, with MARINE having the highest, and GR4H being more robust in validation. At the event scale, indices of discharge showed that, the event-based MARINE was better in reproducing the peak and its timing. It is followed by SMASH, while GR4H was the least in this aspect. SMASH performed relatively better in the volume of water exported and is followed by GR4H. Regarding the soil moisture simulated by the three models and using the outputs of the operational surface model SIM as the benchmark, MARINE emerged as the most accurate in terms of both the dynamics and the amplitude. GR4H followed closely while SMASH was the least in comparison. This study paves the way for extended model hypothesis and calibration-regionalization methods testing and intercomparison in the light of multi-sourced signatures in order to assess/discriminate internal model behaviors. It highlights, in particular, the need for future investigations on combinations of vertical and lateral flow components, including groundwater exchanges, in distributed hydrological models along with new optimization methods for optimally exploiting, at the regional scale, multi-source datasets composed of both physiographic data and hydrological signatures.