Moving target tracking is a technology that matches frames and images based on target characteristics. This technology is widely utilized in intelligent transportation, logistics transportation, public security, sports event broadcasting, and other fields. Existing research focuses primarily on improving target detection and tracking algorithms to improve target retrieval and tracking efficiency. However, the majority of studies focus on global and full-range retrieval. More importantly, in large video scenes with multiple camera collaborations, these methods rarely consider the efficiency of target retrieval and tracking. Based on relevant theories and methods of video GIS, set theory, and topology, in this paper, a set and its topology space covering road networks, cameras, videos, and key frames were constructed. Additionally, the positioning, tracking, and track representation of a moving target based on the set and its topology space were solved. Compared to the feature vector algorithm, video summarization and Meanshift algorithm, the experimental findings reveal that the target retrieval performance, algorithm stability, and robustness are improved.