2017
DOI: 10.48550/arxiv.1706.04261
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The "something something" video database for learning and evaluating visual common sense

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
14
0

Year Published

2017
2017
2025
2025

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 0 publications
1
14
0
Order By: Relevance
“…The task formulation is aimed at achieving an "intuitive" figure-understanding system, that does not resort to inverting the visualization pipeline. This is in line with the recent trend in visual-textual datasets, such as those for intuitive physics and reasoning (Goyal et al, 2017;Mun et al, 2016).…”
Section: Related Worksupporting
confidence: 87%
“…The task formulation is aimed at achieving an "intuitive" figure-understanding system, that does not resort to inverting the visualization pipeline. This is in line with the recent trend in visual-textual datasets, such as those for intuitive physics and reasoning (Goyal et al, 2017;Mun et al, 2016).…”
Section: Related Worksupporting
confidence: 87%
“…Many video datasets are available to test models of action recognition or detection, including Hollywood2 [22], La-belMe video [40], UCF101 [31], HMDB51 [21], THUMOS [18], AVA [13], "something something" [12] and Charades [29]. Training deep neural networks for these tasks requires available large video datasets, like ActivityNet [6], Kinetics [19], Moments in Time [26], or YouTube-8M [1].…”
Section: Video Datasets and Modelsmentioning
confidence: 99%
“…Several large-scale video datasets provide a large diversity and coverage in terms of the categories of activities and exemplars they capture [19], [12], [26]. However, these labeled datasets only provide a single annotated label for each video and this label may not cover the rich spectrum of events occurring in the video.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally in this work the following datasets are used: NIST TRECVID Twitter vines [1], TGIF [18], MSVD [2], YouCook2 [43], Something-something V2 [10], Kinetics 700 [31], HowTo100M [23].…”
Section: Datasetsmentioning
confidence: 99%