When tracking maneuvering targets, recurrent neural networks (RNNs), especially long short-term memory (LSTM) networks, are widely applied to sequentially capture the motion states of targets from observations. However, LSTMs can only extract features of trajectories stepwise; thus, their modeling of maneuvering motion lacks globality. Meanwhile, trajectory datasets are often generated within a large, but fixed distance range. Therefore, the uncertainty of the initial position of targets increases the complexity of network training, and the fixed distance range reduces the generalization of the network to trajectories outside the dataset. In this study, we propose a transformer-based network (TBN) that consists of an encoder part (transformer layers) and a decoder part (one-dimensional convolutional layers), to track maneuvering targets. Assisted by the attention mechanism of the transformer network, the TBN can capture the long short-term dependencies of target states from a global perspective. Moreover, we propose a center–max normalization to reduce the complexity of TBN training and improve its generalization. The experimental results show that our proposed methods outperform the LSTM-based tracking network.