The safe landing and rapid recovery of the reentry capsules are very important to manned spacecraft missions. A variety of uncertain factors, such as flight control accuracy and wind speed, lead to a low orbit prediction accuracy and a large landing range of reentry capsules. It is necessary to realize the autonomous tracking and continuous video observation of the reentry capsule during the low-altitude phase. Aiming at the Shenzhou return capsule landing mission, the paper proposes a new approach for the autonomous tracking of Shenzhou reentry capsules based on video detection and heterogeneous UAV swarms. A multi-scale video target detection algorithm based on deep learning is developed to recognize the reentry capsules and obtain positioning data. A self-organizing control method based on virtual potential field is proposed to realize the cooperative flight of UAV swarms. A hardware-in-the-loop simulation system is established to verify the method. The results show that the reentry capsule can be detected in four different states, and the detection accuracy rate of the capsule with parachute is 99.5%. The UAV swarm effectively achieved autonomous tracking for the Shenzhou reentry capsule based on the position obtained by video detection. This is of great significance in the real-time searching of reentry capsules and the guaranteeing of astronauts’ safety.