Forecasting methods are important decision support tools in geo-distributed sensor networks. However, challenges such as the multivariate nature of data, the existence of multiple nodes, and the presence of spatio-temporal autocorrelation increase the complexity of the task. Existing forecasting methods are unable to address these challenges in a combined manner, resulting in a suboptimal model accuracy. In this article, we propose GAP-LSTM, a novel geo-distributed forecasting method that leverages the synergic interaction of graph convolution, attention-based long short-term memory (LSTM), 2-D-convolution, and latent memory states to effectively exploit spatio-temporal autocorrelation in multivariate data generated by multiple nodes, resulting in improved modeling capabilities. Our extensive evaluation, involving real-world datasets on traffic, energy, and pollution domains, showcases the ability of our method to outperform state-of-the-art forecasting methods. An ablation study confirms that all method components provide a positive contribution to the accuracy of the extracted forecasts. The method also provides an interpretable visualization that complements forecasts with additional insights for domain experts.