In recent years, the rapid development of the sports industry has led to increasing attention on complex sports scenes such as basketball and soccer. One of the challenges in these scenes is tracking athletes in motion. To address this issue, we used a large dataset of images captured by a perception visual system as our training data. In this paper, we focus on tracking athletes in complex sports scenes such as basketball and soccer, and propose a tracking model that incorporates a target search attention mechanism and integrates saliency heat maps for feature extraction and target information integration. Our proposed SeaVit method significantly improves tracking performance compared with existing discriminative correlation filter (DCFbased) techniques. Our experimental results demonstrate that our method outperforms existing methods in both precision and real-time performance. With an average distance precision (DP) of 80.5, our proposed tracker outperforms other trackers, achieving the highest DP value. Our study has broad practical applications. In the future, our method can be embedded in visual sensors and applied to athlete training, which helps to improve the effectiveness of athlete training and competition performance.