s u m m a r yThe Office of Hydrologic Development (OHD) of the National Oceanic and Atmospheric Administration's (NOAA) National Weather Service (NWS) conducted the second phase of the Distributed Model Intercomparison Project (DMIP 2). After DMIP 1, the NWS recognized the need for additional science experiments to guide its research-to-operations path towards advanced hydrologic models for river and water resources forecasting. This was accentuated by the need to develop a broader spectrum of water resources forecasting products (such as soil moisture) in addition to the more traditional river, flash flood, and water supply forecasts. As it did for DMIP 1, the NWS sought the input and contributions from the hydrologic research community. DMIP 1 showed that using operational precipitation data, some distributed models could indeed perform as well as lumped models in several basins and better than lumped models for one basin. However, in general, the improvements were more limited than anticipated by the scientific community. Models combining so-called conceptual rainfall-runoff mechanisms with physically-based routing schemes achieved the best overall performance. Clear gains were achieved through calibration of model parameters, with the average performance of calibrated models being better than uncalibrated models. DMIP 1 experiments were hampered by temporally-inconsistent precipitation data and few runoff events in the verification period for some basins. Greater uncertainty in modeling small basins was noted, pointing to the need for additional tests of nested basins of various sizes.DMIP 2 experiments in the Oklahoma (OK) region were more comprehensive than in DMIP 1, and were designed to improve our understanding beyond what was learned in DMIP 1. Many more stream gauges were located, allowing for more rigorous testing of simulations at interior points. These included two new gauged interior basins that had drainage areas smaller than the smallest in DMIP 1. Soil moisture and routing experiments were added to further assess if distributed models could accurately model basininterior processes. A longer period of higher quality precipitation data was available, and facilitated a test to note the impacts of data quality on model calibration. Moreover, the DMIP 2 calibration and verification periods contained more runoff events for analysis. Two lumped models were used to define a robust benchmark for evaluating the improvement of distributed models compared to lumped models. Fourteen groups participated in DMIP 2 using a total of sixteen models. Ten of these models were not in DMIP 1. This paper presents the motivation for DMIP 2 Oklahoma experiments, discusses the major project elements, and describes the data and models used. In addition, the paper introduces the findings, which are covered in a companion results paper (Smith et al., this issue). Lastly, the paper summarizes the DMIP 1 and 2 experiments with commentary from the NWS perspective. Future papers will cover the DMIP 2 experiments in the western...