2020
DOI: 10.1609/aaai.v34i03.5653
|View full text |Cite
|
Sign up to set email alerts
|

UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions

Abstract: Action recognition in still images poses a great challenge due to (i) fewer available training data, (ii) absence of temporal information. To address the first challenge, we introduce a dataset for STill image Action Recognition (STAR), containing over $1M$ images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images. The key characteristics of UCF-STAR include (1) focusing on human body-motion rather than relatively st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…To thoroughly evaluate the RCAT framework, we conducted comprehensive comparative and ablation studies, emphasizing key metrics like computational efficiency, inference speed, and model complexity. These evaluations were carried out on prominent video classification datasets: UCF101 [11] and HMDB51 [13]. Moreover, to illustrate the model's robust generalization ability, we performed video retrieval experiments on the MSR-VTT [14] dataset, where our framework achieved state-of-the-art performance.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…To thoroughly evaluate the RCAT framework, we conducted comprehensive comparative and ablation studies, emphasizing key metrics like computational efficiency, inference speed, and model complexity. These evaluations were carried out on prominent video classification datasets: UCF101 [11] and HMDB51 [13]. Moreover, to illustrate the model's robust generalization ability, we performed video retrieval experiments on the MSR-VTT [14] dataset, where our framework achieved state-of-the-art performance.…”
Section: Methodsmentioning
confidence: 99%
“…This technique meticulously recalibrates the pre-trained CLIP model, enhancing its compatibility with video recognition tasks by fine-tuning the alignment between video and textual features. Consequently, we evolve from Equations ( 8) and (9) to Equations ( 10) and (11):…”
Section: Video Adapter Tuningmentioning
confidence: 99%
See 2 more Smart Citations
“…In this respect, initial studies in HAR have mostly employed low-level feature extraction techniques to capture low-level structures; however, they fail to achieve reliable and satisfactory results [10]. Another category of approaches explores to leverage object detector [11] or pose estimator [12] developments to detect the keypoint joints [13], which are the most discriminative regions in the foreground area. Such detected areas favorably contribute to the overall action recognition accuracy.…”
Section: Introductionmentioning
confidence: 99%