Hydrological forecasting is one of the key research areas in hydrology. Innovative forecasting tools will reform water resources management systems, flood early warning mechanisms, and agricultural and hydropower management schemes. Hence, in this study, we compared Stacked Long Short-Term Memory (S-LSTM), Bidirectional Long Short-Term Memory (Bi-LSTM), and Gated Recurrent Unit (GRU) with the classical Multilayer Perceptron (MLP) network for one-step daily streamflow forecasting. The analysis used daily time series data collected from Borkena (in Awash river basin) and Gummera (in Abay river basin) streamflow stations. All data sets passed through rigorous quality control processes, and null values were filled using linear interpolation. A partial autocorrelation was also applied to select the appropriate time lag for input series generation. Then, the data is split into training and testing datasets using a ratio of 80 : 20, respectively. Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and coefficient of determination (R2) were used to evaluate the performance of the proposed models. Finally, the findings are summarized in model variability, lag time variability, and time series characteristic themes. As a result, time series characteristics (climatic variability) had a more significant impact on streamflow forecasting performance than input lagged time steps and deep learning model architecture variations. Thus, Borkena’s river catchment forecasting result is more accurate than Gummera’s catchment forecasting result, with RMSE, MAE, MAPE, and R2 values ranging between (0.81 to 1.53, 0.29 to 0.96, 0.16 to 1.72, 0.96 to 0.99) and (17.43 to 17.99, 7.76 to 10.54, 0.16 to 1.03, 0.89 to 0.90) for both catchments, respectively. Although the performance is dependent on lag time variations, MLP and GRU outperform S-LSTM and Bi-LSTM on a nearly equal basis.