GPS-based maneuvering target localization and tracking is a crucial aspect of autonomous driving and is widely used in navigation, transportation, autonomous vehicles, and other fields.The classical tracking approach employs a Kalman filter with precise system parameters to estimate the state. However, it is difficult to model their uncertainty because of the complex motion of maneuvering targets and the unknown sensor characteristics. Furthermore, GPS data often involve unknown color noise, making it challenging to obtain accurate system parameters, which can degrade the performance of the classical methods. To address these issues, we present a state estimation method based on the Kalman filter that does not require predefined parameters but instead uses attention learning. We use a transformer encoder with a long short-term memory (LSTM) network to extract dynamic characteristics, and estimate the system model parameters online using the expectation maximization (EM) algorithm, based on the output of the attention learning module. Finally, the Kalman filter computes the dynamic state estimates using the parameters of the learned system, dynamics, and measurement characteristics. Based on GPS simulation data and the Geolife Beijing vehicle GPS trajectory dataset, the experimental results demonstrated that our method outperformed classical and pure model-free network estimation approaches in estimation accuracy, providing an effective solution for practical maneuvering-target tracking applications.