In the last several years, computer vision tasks involving visual identification and tracking have seen a rise in the usage of deep learning technologies in recent years. An extremely difficult but rewarding endeavor is identifying and following football players’ targets. This may be used to study football tactical visualization. Due to the similar appearance and frequent occlusion of targets in football video, traditional methods often can only segment targets such as players and balls in the image but cannot track them or can only track them for a short time. Based on the related research of computer vision and deep learning, using several cameras, this study develops a system that can properly monitor many targets in a football stadium for a lengthy period of time. The main research contents of this paper are as follows: (1) a CNN for target displacement prediction is proposed, which no longer relies on the previous linear motion model or quadratic motion model, so that the multitarget tracking algorithm can be applied to more scenes. (2) For the first time in a multitarget tracking algorithm, a continuous conditional random field is used to model the asymmetric nature of the target relationship. At the same time, the CNN for target displacement prediction can be cascaded with the continuous conditional random field for end-to-end training, which greatly reduces the training difficulty. The parameters of the experiment in this paper are simple, and comprehensive and systematic experiments verify the validity and correctness of this work from different aspects.