Streamflow forecasts often perform poorly because of improper representation of hydrologic response timescales in underlying models. Here, we use transfer entropy (TE), which measures information flow between variables, to identify dominant drivers of discharge and their timescales using sensor data from the Dry Creek Experimental Watershed, ID, USA. Consistent with previous mechanistic studies, TE revealed that snowpack accumulation and partitioning into melt, recharge, and evaporative loss dominated discharge patterns and that snow‐sourced baseflow reduced the greatest amount of uncertainty in discharge. We hypothesized that machine learning models (MLMs) specified in accordance with the dominant lag timescales, identified via TE, would outperform timescale‐agnostic models. However, while lagged‐variable random forest regressions captured the dominant process—seasonal snowmelt—they ultimately did not perform as well as the unlagged models, provided those models were specified with input data aggregated over a range of timescales. Unlagged models, not constrained by timescales of the dominant processes, more effectively represented variable interactions (e.g., rain‐on‐snow events) playing a critical role in translating precipitation into streamflow over long, intermediate, and short timescales. Meanwhile, long short‐term memory (LSTM) models were effective in internally identifying the key lag and aggregation scales for predicting discharge. Parsimonious specification of LSTM models, using only daily unlagged precipitation and temperature data, produced the highest performing predictions. Our findings suggest that TE can identify dominant streamflow controls and the relative importance of different mechanisms of streamflow generation, useful for establishing process baselines and fingerprinting watersheds. However, restricting MLMs based on dominant timescales undercuts their skill at learning these timescales internally.