In this paper, we propose a novel framework for multi-target multi-camera tracking (MTMCT) of vehicles based on metadata-aided re-identification (MA-ReID) and the trajectory-based camera link model (TCLM). Given a video sequence and the corresponding frame-by-frame vehicle detections, we first address the isolated tracklets issue from single camera tracking (SCT) by the proposed traffic-aware singlecamera tracking (TSCT). Then, after automatically constructing the TCLM, we solve MTMCT by the MA-ReID. The TCLM is generated from camera topological configuration to obtain the spatial and temporal information to improve the performance of MTMCT by reducing the candidate search of ReID. We also use the temporal attention model to create more discriminative embeddings of trajectories from each camera to achieve robust distance measures for vehicle ReID. Moreover, we train a metadata classifier for MTMCT to obtain the metadata feature, which is concatenated with the temporal attention based embeddings. Finally, the TCLM and hierarchical clustering are jointly applied for global ID assignment. The proposed method is evaluated on the CityFlow dataset, achieving IDF1 76.77%, which outperforms the state-of-the-art MTMCT methods.
The recent research by deep learning has shown many breakthroughs with high performance that were not achieved with traditional machine learning algorithms. Particularly in the field of object detection, commercial products with high accuracy in the real environment are applied through the deep learning methods. However, the object detection method using the convolutional neural network (CNN) has a disadvantage that a large number of feature maps should be generated in order to be robust against scale change and occlusion of the object. Also, simply raising the number of feature maps does not improve performance. In this paper, we propose to integrate additional prediction layers into conventional Yolo-v3 using spatial pyramid pooling to complement the detection accuracy of the vehicle for large scale changes or being occluded by other objects. Our proposed detector achieves 85.29% mAP, which outperformed than those of the DPM, ACF, R-CNN, CompACT, NANO, EB, GP-FRCNN, SA-FRCNN, Faster-R CNN2, HAVD, and SSD-VDIG on the UA-DETRAC benchmark data-set consisting of challenging real-world-traffic videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.