Short-term load forecasting (STLF) is vital for the daily operation of power grids. However, the nonlinearity, non-stationarity, and randomness characterizing electricity demand time series renders STLF a challenging task. To that end, different forecasting methods have been proposed in the literature for day-ahead load forecasting, including a variety of deep learning models that are currently considered to achieve state-of-the-art performance. In order to compare the accuracy of such models, we focus on national net aggregated STLF and examine well-established autoregressive neural networks of indicative architectures, namely multi-layer perceptrons, N-BEATS, long short-term memory neural networks, and temporal convolutional networks, for the case of Portugal. To investigate the factors that affect the performance of each model and identify the most appropriate per case, we also conduct a post-hoc analysis, correlating forecast errors with key calendar and weather features. Our results indicate that N-BEATS consistently outperforms the rest of the examined deep learning models. Additionally, we find that external factors can significantly impact accuracy, affecting both the actual and relative performance of the models.