2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00141
|View full text |Cite
|
Sign up to set email alerts
|

Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Abstract: Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
61
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(61 citation statements)
references
References 49 publications
0
61
0
Order By: Relevance
“…position, size, appearance) over time. Similar extensions are also provided by (Hsieh et al 2018;He et al 2018). In an orthogonal direction, Spatially Invariant Attend, Infer, Repeat (SPAIR) (Crawford and Pineau 2019) improved on AIR's ability to handle cluttered scenes by replacing AIR's recurrent encoder network with a convolutional network and a spatially local object specification scheme.…”
Section: Related Workmentioning
confidence: 92%
See 1 more Smart Citation
“…position, size, appearance) over time. Similar extensions are also provided by (Hsieh et al 2018;He et al 2018). In an orthogonal direction, Spatially Invariant Attend, Infer, Repeat (SPAIR) (Crawford and Pineau 2019) improved on AIR's ability to handle cluttered scenes by replacing AIR's recurrent encoder network with a convolutional network and a spatially local object specification scheme.…”
Section: Related Workmentioning
confidence: 92%
“…We also experimented with Tracking by Animation (TbA) (He et al 2018), but were unable to obtain good tracking performance on these densely cluttered videos. One relevant point is that TbA lacks a means of encouraging the network to explain scenes using few objects, and we found TbA often using several internal objects to explain a single object in the video; in contrast, both SILOT and SQAIR use priors on o pres which encourage o pres to be near 0, forcing the networks to use internal objects efficiently.…”
Section: Scattered Mnistmentioning
confidence: 99%
“…He et al proposed a tracking framework [116] in an end-to-end manner for using unlabeled data, and this framework includes Reprioritized Attentive Tracking with Tracking-By-Animation. Lee and Kim proposed a Feature Pyramid Siamese Network (FPSN) [117] to extract multi-level feature information and to add Spatio-temporal motion features to consider both appearance and motion information.…”
Section: Motion Variationsmentioning
confidence: 99%
“…They tracked objects by fusing trajectory dynamics information, and proposed a novel two-step data association framework. He et al [26] proposed a tracking-by-animation framework to achieve both label-free and end-to-end learning for MOT, unlike tracking-by-detection frameworks, that isolate the detection task from the tracking task. Their differentiable neural network first tracks objects in input frames, and then animates the tracked objects in reconstructed frames.…”
Section: Related Workmentioning
confidence: 99%