Hydrologic models are the primary tools that are used to simulate streamflow drought and assess impacts. However, there is little consensus about how to evaluate the performance of these models, especially as hydrologic modeling moves toward larger spatial domains. This paper presents a comprehensive multi-objective approach to systematically evaluating the critical features in streamflow drought simulations performed by two widely used hydrological models. The evaluation approach captures how well a model classifies observed periods of drought and non-drought, quantifies error components during periods of drought, and assesses the models’ simulations of drought severity, duration, and intensity. We apply this approach at 4662 U.S. Geological Survey streamflow gages covering a wide range of hydrologic conditions across the conterminous U.S. from 1985 to 2016 to evaluate streamflow drought using two national-scale hydrologic models: the National Water Model (NWM) and the National Hydrologic Model (NHM); therefore, a benchmark against which to evaluate additional models is provided. Using this approach, we find that generally the NWM better simulates the timing of flows during drought, while the NHM better simulates the magnitude of flows during drought. Both models performed better in wetter eastern regions than in drier western regions. Finally, each model showed increased error when simulating the most severe drought events.