Forecasting crop yields is becoming increasingly important under the current context in which food security needs to be ensured despite the challenges brought by climate change, an expanding world population accompanied by rising incomes, increasing soil erosion, and decreasing water resources. Temperature, radiation, water availability and other environmental conditions influence crop growth, development, and final grain yield in a complex nonlinear manner. Machine learning (ML) techniques, and deep learning (DL) methods in particular, can account for such nonlinear relations between yield and its covariates. However, they typically lack transparency and interpretability, since the way the predictions are derived is not directly evident. Yet, in the context of yield forecasting, understanding which are the underlying factors behind both a predicted loss or gain is of great relevance. Here, we explore how to benefit from the increased predictive performance of DL methods while maintaining the ability to interpret how the models achieve their results. To do so, we applied a deep neural network to multivariate time series of vegetation and meteorological data to estimate the wheat yield in the Indian Wheat Belt. Then, we visualized and analyzed the features and yield drivers learned by the model with the use of regression activation maps. The DL model outperformed other tested models (ridge regression and random forest) and facilitated the interpretation of variables and processes that lead to yield variability. The learned features were mostly related to the length of the growing season, and temperature and light conditions during this time. For example, our results showed that high yields in 2012 were associated with low temperatures accompanied by sunny conditions during the growing period. The proposed methodology can be used for other crops and regions in order to facilitate application of DL models in agriculture.