2020
DOI: 10.1007/978-3-030-58558-7_26
|View full text |Cite
|
Sign up to set email alerts
|

TAO: A Large-Scale Benchmark for Tracking Any Object

Abstract: For many years, multi-object tracking benchmarks have focused on a handful of categories. Motivated primarily by surveillance and self-driving applications, these datasets provide tracks for people, vehicles, and animals, ignoring the vast majority of objects in the world. By contrast, in the related field of object detection, the introduction of large-scale, diverse datasets (e.g., COCO) have fostered significant progress in developing highly robust solutions. To bridge this gap, we introduce a similarly dive… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
107
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 117 publications
(108 citation statements)
references
References 75 publications
1
107
0
Order By: Relevance
“…There are many datasets focusing on more diverse object categories than person and vehicles. The ImageNet-Vid [12] benchmark provides trajectory annotations for 30 object categories in over 1000 videos and TAO [10] annotates even 833 object categories to study object tracking on long-tailed distribution.…”
Section: Related Workmentioning
confidence: 99%
“…There are many datasets focusing on more diverse object categories than person and vehicles. The ImageNet-Vid [12] benchmark provides trajectory annotations for 30 object categories in over 1000 videos and TAO [10] annotates even 833 object categories to study object tracking on long-tailed distribution.…”
Section: Related Workmentioning
confidence: 99%
“…There are a number of public datasets with box-level annotations for different video tasks: ImageNet-VID [142] for video object detection; LaSOT [115], GOT-10k [143], Youtube-BB [144], and TrackingNet [145] for single ob- ject tracking; MOT [146], TAO [147], Youtube-VOS [15] and Youtube-VIS [16] for multi-object tracking. However, none of these datasets meet the requirement of our proposed few-shot video object detection task.…”
Section: Dataset Collectionmentioning
confidence: 99%
“…To save human annotation effort as much as possible, rather than building our dataset from scratch, we exploit existing large-scale video datasets for supervised learning, i.e., LaSOT [115], GOT-10k [143], and TAO [147] to construct our dataset subject to the above three criteria by: Dataset Filtering. Note that the above datasets cannot be directly used since they are only partially annotated for tracking task: although multiple objects of a given class are present in the video, only some or even one of them is annotated while others may be ignored.…”
Section: Dataset Collectionmentioning
confidence: 99%
“…For video object identification, we require video object sequences where objects are associated across multiple frames. Hence, to train and evaluate our proposed approach, we used four video instance segmentation datasets: YouTube Video Instance Segmentation (YT-VIS) [51], Unidentified Video Objects (UVO) [47], Occluded Video Instance Segmentation (OVIS) [34], and Tracking Any Object with Video Object Segmentation (TAO-VOS) [8,43]. All these datasets contain a large object vocabulary and various challenging scenarios, including perceptually-aliased occluded objects, as described below:…”
Section: Datasetsmentioning
confidence: 99%
“…4) TAO-VOS: This dataset is a subset of the Tracking Any Object (TAO) dataset [8] with masks for video object segmentation. TAO is a benchmark federated object tracking dataset comprising videos from 7 datasets captured in diverse environments.…”
Section: Datasetsmentioning
confidence: 99%