Accurate prediction of mesoscale eddy trajectories requires efficient models with large-size available data instances to capture the main eddy characteristics. However, there is a lack of salient attention mechanisms that can recognize the demand of extracting the aggregated features over multi-dimensional eddy data sources. Additionally, deep learning techniques are very important for eddy trajectory prediction to dynamically capture properties of mesoscale eddies in the South China Sea (SCS). In this paper, we propose a spatio-temporal attention-based deep learning framework that can orchestrate heterogeneous data integration and propagation trajectory forecast together. It consists of a novel autoencoder equipped with channel and spatial attention mechanisms (CSA-Encoder), and a gated recurrent unit (GRU) network with temporal attention layer (TA-GRU). CSA-Encoder compresses stereoscopic eddy data with convolutional layers and generates the small-scale and high-quality dataset as the input of TA-GRU. The finer-grained TA-GRU method is extended to accurately predict eddy trajectories with more valuable imagery information so that the temporal attention mechanism can automatically select relevant regions within the next 14 days. Our cross-validation results demonstrate that our framework averagely achieves a lower distance error (9 km) and 54% performance improvement over the baseline GRU technique in the next one day, and outperforms two state-of-theart techniques of long short-term memory (LSTM) and recurrent neural network (RNN) by 54.9% and 65.6%, respectively.