Accurate streamflow predictions are essential for water resources management. Recent studies have examined the use of hybrid models that integrate machine learning models with process‐based (PB) hydrologic models to improve streamflow predictions. Yet, there are many open questions regarding optimal hybrid model construction, especially in Mediterranean‐climate watersheds that experience pronounced wet and dry seasons. In this study, we performed model benchmarking to (a) compare hybrid model performance to PB and machine learning models and (b) examine the sensitivity of hybrid model performance to PB model parameter calibration, structural complexity, and variable selection. Hybrid models were generated by post‐processing process‐based models using Long Short‐Term Memory neural networks. Models were benchmarked within two northern California watersheds that are managed for both municipal water supplies and aquatic habitat. Though model performance varied substantially by watershed and error metric, calibrated hybrid models frequently outperformed both the machine learning model (for 72% of watershed‐model‐metric combinations) and the calibrated process‐based models (for 79% of combinations). Furthermore, hybrid models were relatively insensitive to PB model calibration and structural complexity, but sensitive to PB model variable selection. Our results demonstrate that hybrid models can improve streamflow prediction in Mediterranean‐climate watersheds. Additionally, hybrid model insensitivity to PB model parameter calibration and structural complexity suggests that uncalibrated or less complex PB models could be used in hybrid models without any loss of streamflow prediction accuracy, improving model construction efficiency. Moreover, hybrid model sensitivity to the selection of PB model variables suggests a strategy for diagnosing poorly performing PB model components.