Existing multi target multi camera tracking (MTMCT) datasets are small in terms of the number of identities and video length. The creation of new real world datasets is hard as privacy has to be guaranteed and the labeling is tedious. Therefore in the scope of this work a mod for GTA V to record a MTMCT dataset has been developed and used to record a simulated MTMCT dataset called Multi Camera Track Auto (MTA). The MTA dataset contains over 2,800 person identities, 6 cameras and a video length of over 100 minutes per camera. Additionally a MTMCT system has been implemented to provide a baseline for the created dataset. The system's pipeline consists of stages for person detection, person re-identification, single camera multi target tracking, track distance calculation, and track association. The track distance calculation comprises a weighted aggregation of the following distances: a single camera time constraint, a multi camera time constraint using overlapping camera areas, an appearance feature distance, a homography matching with pairwise camera homographies, and a linear prediction based on the velocity and the time difference of tracks. When using all partial distances, we were able to surpass the results of state-of-the-art single camera trackers by +13% IDF1 score. The MTA dataset, code, and baselines are available at github.com/schuar-iosb/mta-dataset.
Multi-camera tracking of vehicles on a city-scale level is a crucial task for efficient traffic monitoring. Most of the errors made by such multi-target multi-camera tracking systems arise due to tracking failures or misleading visual information of detection boxes under occlusion. Therefore, we propose an occlusion-aware approach that leverages temporal information from tracks to improve the single-camera tracking performance by an occlusion handling strategy and additional modules to filter false detections. For the multi-camera tracking, we discard obstacle-occluded detection boxes by a background filtering technique and boxes overlapping with other targets us-ing the available track information to improve the quality of extracted visual features. Furthermore, topological and temporal constraints are incorporated to simplify the reidentification task in the multi-camera clustering. We give detailed insights into our method with ablative experiments and show its competitiveness on the CityFlowV2 dataset, where we achieve promising results ranking 4th in Track 3 of the 2021 AI City Challenge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.