Data-driven methodologies have been used in reservoir management and production forecasting, particularly demonstrating remarkable efficacy in short-term oil production forecasts. However, there is space to improve its prediction, especially in tackling the complexities of challenging reservoirs, such as the heterogeneous carbonate reservoirs from Brazilian Pre-salt fields. Methods for oil production forecasting in the petroleum literature generally consider linear correlations or recurrent neural networks (RNNs). In this paper, we propose a new strategy to improve short-term forecasting for oil production through attention mechanisms that boost state-of-the-art methods. Traditional data-driven techniques generally do not consider static data or planned activities. However, we address this critical gap by leveraging the Temporal Fusion Transformer (TFT) to integrate such information into our short-term forecasting. Transformers, the architectural inspiration behind ChatGPT, employ attention mechanisms to establish relationships between different time series data points, assigning weights to these connections. We jointly explore oil, gas, and water production, pressure, and the ratios between them. This method includes static data (e.g., geographical coordinates) as well as known side reservoir information. Such side information can be, for instance, another predicted future production or planned well shut-ins. We also investigate which side information improves the obtained forecasting. This paper presents two main findings. First, it shows how using certain side information can improve the overall predictive capability of a model. For example, using predicted gas production as side information can significantly improve the oil production forecast. This is logical and in line with expectations, as there is an intimate connection between oil and gas production. In the second application of TFT, we considered well closures as the side information. We used an anomaly detection tool to identify well closures in the history period and converted it to usable side information for the TFT model. The distribution of these well closures is used as a guide to predict our target oil production. As we considered the distribution of well closures as side information, we framed our results in terms of cumulative oil production rather than daily forecast rates. The results of this work show that the cumulative production gets very close to the ground-truth data, better than linear and proposed baselines. In summary, the second key result shows and underscores the significance of incorporating side information within our TFT approach.