2020
DOI: 10.1609/aaai.v34i04.5777
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

Abstract: The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i.e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assistance. The task of learning to discover and track objects in videos, which we call unsupervised object tracking, ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(23 citation statements)
references
References 10 publications
0
23
0
Order By: Relevance
“…In SQ-AIR, this work is extended to sequences of images, and a discovery and propagation mechanism was introduced to track objects through different frames (Kosiorek et al, 2018 ). These have been extended to better handle physical interactions (Kossen et al, 2020 ) or be more scalable (Crawford and Pineau, 2020 ; Jiang et al, 2020 ). These extensions have also been combined by Lin et al ( 2020 ).…”
Section: Discussionmentioning
confidence: 99%
“…In SQ-AIR, this work is extended to sequences of images, and a discovery and propagation mechanism was introduced to track objects through different frames (Kosiorek et al, 2018 ). These have been extended to better handle physical interactions (Kossen et al, 2020 ) or be more scalable (Crawford and Pineau, 2020 ; Jiang et al, 2020 ). These extensions have also been combined by Lin et al ( 2020 ).…”
Section: Discussionmentioning
confidence: 99%
“…Spatially invariant attendinfer-repeat (SPAIR) models [7] used CNN feature extractors and a local spatial object specification scheme to conduct UMOT video processing with variational inference. Building upon the SPAIR method, spatially invariant label-free object tracking (SILOT) [4] incorporated VAE based architecture with competitive MOT accuracy on MNIST-MOT and Atari video games. However, previous methods and benchmark models do not provide experimental studies on the effect of background noise on tracker performance.…”
Section: Unsupervised Multi-object Trackingmentioning
confidence: 99%
“…As shown in Figure 3, we select two baseline video datasets from the previous UMOT studies: (a) MNIST-MOT [3,7] containing 2M training frames and 25k validation frames, where each frame is of size 128 × 128 × 1 and (b) Atari gaming video on Space-Invader (Atari-SI) [4,7] containing 128k training frames and 1k validation frames, where each frame is converted to gray-scale with a input size of 210 × 160 × 1 for testing.…”
Section: Datasets and Baseline Setupmentioning
confidence: 99%
See 2 more Smart Citations