End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models

Xiang, Jun; Xu, Guohan; Ma, Chao; Hou, Jian

doi:10.1109/tcsvt.2020.2975842

Cited by 64 publications

(23 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to improve orientation predictions from the video, we plan to integrate attention mechanisms (in the spirit of [45]) that estimate the orientation only for those detections of a tracklet that do not contain impaired visual information, such as partial occlusions or motion blur. We further plan to transform our proposed tracker to an end-to-end trainable tracking system, inspired by the current progress in this direction [79], [80] for other tracking systems. While we demonstrated that a fusion of Video data with IMU signals improves multiple people tracking systems, the same concept could be applied to track other objects, which would extend our setup to VIMOT (Video Inertial Mulit-Object Tracking).…”

Section: Discussionmentioning

confidence: 99%

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

Henschel

Marcard

Rosenhahn

2020

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.

show abstract

Section: Discussionmentioning

confidence: 99%

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

Henschel

Marcard

Rosenhahn

2020

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

show abstract

“…Alternatively, (Xiang et al 2020) uses MHT framework (Reid 1979) to link tracklets, while iteratively re-evaluating appearance/motion models based on progressively merged tracklets. This approach is one of the top on MOT17, achieving 54.87% MOTA.…”

Section: Learning To Combine Association Cuesmentioning

confidence: 99%

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Dendorfer¹,

Ošep²,

et al. 2020

View full text Add to dashboard Cite

Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes, but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light into potential future research directions.

show abstract

“…They performed optimization and achieved favorable results. In [34][35][36][37][38], deep learning technology has been further applied to the conditional random field tracking model, in order to improve the distinction degree of object features. In [6,11], a larger range of node relationships were considered and a hypergraph model was established to address the data association problem.…”

Section: Related Workmentioning

confidence: 99%

Multiple Object Tracking for Dense Pedestrians by Markov Random Field Model with Improvement on Potentials

Liu¹,

Li²,

Wang³

et al. 2020

Sensors

View full text Add to dashboard Cite

Pedestrian tracking in dense crowds is a challenging task, even when using a multi-camera system. In this paper, a new Markov random field (MRF) model is proposed for the association of tracklet couplings. Equipped with a new potential function improvement method, this model can associate the small tracklet coupling segments caused by dense pedestrian crowds. The tracklet couplings in this paper are obtained through a data fusion method based on image mutual information. This method calculates the spatial relationships of tracklet pairs by integrating position and motion information, and adopts the human key point detection method for correction of the position data of incomplete and deviated detections in dense crowds. The MRF potential function improvement method for dense pedestrian scenes includes assimilation and extension processing, as well as a message selective belief propagation algorithm. The former enhances the information of the fragmented tracklets by means of a soft link with longer tracklets and expands through sharing to improve the potentials of the adjacent nodes, whereas the latter uses a message selection rule to prevent unreliable messages of fragmented tracklet couplings from being spread throughout the MRF network. With the help of the iterative belief propagation algorithm, the potentials of the model are improved to achieve valid association of the tracklet coupling fragments, such that dense pedestrians can be tracked more robustly. Modular experiments and system-level experiments are conducted using the PETS2009 experimental data set, where the experimental results reveal that the proposed method has superior tracking performance.

show abstract

End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models

Cited by 64 publications

References 57 publications

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Multiple Object Tracking for Dense Pedestrians by Markov Random Field Model with Improvement on Potentials

Contact Info

Product

Resources

About