2021
DOI: 10.1109/tcsvt.2020.2975842
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models

Abstract: Existing deep multi-object tracking (MOT) approaches first learn a deep representation to describe target objects and then associate detection results by optimizing a linear assignment problem. Despite demonstrated successes, it is challenging to discriminate target objects under mutual occlusion or to reduce identity switches in crowded scenes. In this paper, we propose learning deep conditional random field (CRF) networks, aiming to model the assignment costs as unary potentials and the long-term dependencie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 64 publications
(23 citation statements)
references
References 57 publications
0
23
0
Order By: Relevance
“…In order to improve orientation predictions from the video, we plan to integrate attention mechanisms (in the spirit of [45]) that estimate the orientation only for those detections of a tracklet that do not contain impaired visual information, such as partial occlusions or motion blur. We further plan to transform our proposed tracker to an end-to-end trainable tracking system, inspired by the current progress in this direction [79], [80] for other tracking systems. While we demonstrated that a fusion of Video data with IMU signals improves multiple people tracking systems, the same concept could be applied to track other objects, which would extend our setup to VIMOT (Video Inertial Mulit-Object Tracking).…”
Section: Discussionmentioning
confidence: 99%
“…In order to improve orientation predictions from the video, we plan to integrate attention mechanisms (in the spirit of [45]) that estimate the orientation only for those detections of a tracklet that do not contain impaired visual information, such as partial occlusions or motion blur. We further plan to transform our proposed tracker to an end-to-end trainable tracking system, inspired by the current progress in this direction [79], [80] for other tracking systems. While we demonstrated that a fusion of Video data with IMU signals improves multiple people tracking systems, the same concept could be applied to track other objects, which would extend our setup to VIMOT (Video Inertial Mulit-Object Tracking).…”
Section: Discussionmentioning
confidence: 99%
“…Alternatively, (Xiang et al 2020) uses MHT framework (Reid 1979) to link tracklets, while iteratively re-evaluating appearance/motion models based on progressively merged tracklets. This approach is one of the top on MOT17, achieving 54.87% MOTA.…”
Section: Learning To Combine Association Cuesmentioning
confidence: 99%
“…They performed optimization and achieved favorable results. In [34][35][36][37][38], deep learning technology has been further applied to the conditional random field tracking model, in order to improve the distinction degree of object features. In [6,11], a larger range of node relationships were considered and a hypergraph model was established to address the data association problem.…”
Section: Related Workmentioning
confidence: 99%