2021
DOI: 10.1109/access.2021.3115476
|View full text |Cite
|
Sign up to set email alerts
|

Video Action Understanding

Abstract: Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. However, due to the scale and temporal nature of video, the span of video understanding problems and the set of proposed deep learning solutions is arguably wider and more diverse than those of their 2D image siblings. Finding, identifying, and predicting actions are a few of the most salient tasks in this emerging and rapidly evolving field. With a pedagogical emphasis, this … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0
8

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(19 citation statements)
references
References 200 publications
(224 reference statements)
0
11
0
8
Order By: Relevance
“…Action recognition has been an actively studied topic [1] over the last two decades, and various models have been devised to capture the temporal information, such as X3D [18] with 3D CNN, as well as recent models [19] based on Vision Transformer [20]. However, they all require one model per domain and usually each dataset is used to train and validate models separately for performance evaluation.…”
Section: Action Recognition and Domain Adaptationmentioning
confidence: 99%
See 1 more Smart Citation
“…Action recognition has been an actively studied topic [1] over the last two decades, and various models have been devised to capture the temporal information, such as X3D [18] with 3D CNN, as well as recent models [19] based on Vision Transformer [20]. However, they all require one model per domain and usually each dataset is used to train and validate models separately for performance evaluation.…”
Section: Action Recognition and Domain Adaptationmentioning
confidence: 99%
“…Video recognition tasks [1], especially recognition of human actions, has become important in various real-world applications, and therefore many methods have been proposed. In order to train deep models, it is necessary to collect a variety of videos of human actions in various situations, therefore many datasets have been proposed [2]- [4].…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, human action detection has become an active area of research driven by numerous vision applications and industries such as autonomous driving, security monitoring, transport and human-computer interaction systems, etc. The task (also referred to as action localization) concerns creating spatiotemporal action proposals to locate individual actors in space and time from a video, as well as classifying their undergoing action categories [1]. Inherently, action detection imposes more challenges in general when compared to action recognition which seeks only the global label of the video.…”
Section: Introductionmentioning
confidence: 99%
“…Именно вторая группа алгоритмов является объектом интереса данной работы. К основным задачам понимания видео относят такие категории задач как распознавание действия, предсказание действия и локализация действия [1].…”
Section: Introductionunclassified
“…Следующей задачей понимания видео является предсказание действий, внутри которого также выделяют несколько подзадач: предвосхищение действия и раннее предсказание действий [1]. В задаче предвосхищения действий классификация осуществляется на основании контекстуальных подсказок, в то время как действие еще не начало выполняться.…”
Section: Introductionunclassified