2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01232
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition

Abstract: Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. Our primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
231
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 280 publications
(235 citation statements)
references
References 67 publications
(121 reference statements)
4
231
0
Order By: Relevance
“…Analysis of results. Subject specific attributes such as male and bald are evidently more transferable from recognition (left columns of Table 1) than attributes that are related to Although this relationship has been noted by others, previous work used domain knowledge to determine which attributes are more transferable from identity [35], as others have done in other domains [20,38]. By comparison, our work shows how these relationships emerge from our estimation of transferability.…”
Section: Case Study: Identity To Facial Attributessupporting
confidence: 54%
“…Analysis of results. Subject specific attributes such as male and bald are evidently more transferable from recognition (left columns of Table 1) than attributes that are related to Although this relationship has been noted by others, previous work used domain knowledge to determine which attributes are more transferable from identity [35], as others have done in other domains [20,38]. By comparison, our work shows how these relationships emerge from our estimation of transferability.…”
Section: Case Study: Identity To Facial Attributessupporting
confidence: 54%
“…Results and Discussion. Our results are presented in Table 6 and compared to state-of-the-art methods [69], [72], [96], [97], [98]. Our final model, using only RGB frames, achieves state-of-the-art results in comparison to all prior work, including those use optical flow [72], object detector [98] or audio data [69].…”
Section: Extension To Epic-kitchens Datasetmentioning
confidence: 92%
“…Our ip-CSN-152 is still 0.6% lower than SlowFast augmented with Non-Local Networks. Finally, recent work [13] has shown that R(2+1)D can achieve strong performance when pre-trained on a large-scale weakly-supervised dataset. We pre-train/finetune ir-and ip-CSN-152 on the same dataset and compare it with R(2+1)D-152 (the last three rows of Table 5).…”
Section: Comparison With the State-of-the-artmentioning
confidence: 98%