Proceedings of the 17th ACM International Conference on Multimedia 2009
DOI: 10.1145/1631272.1631342
|View full text |Cite
|
Sign up to set email alerts
|

Localizing volumetric motion for action recognition in realistic videos

Abstract: This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynami… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 9 publications
1
5
0
Order By: Relevance
“…In addition, the actions they include are non-periodic and well defined in time. These sets, although new, have already drawn a lot of attention (see for example [14], [28], [29], [30]). …”
Section: Data-setsmentioning
confidence: 99%
“…In addition, the actions they include are non-periodic and well defined in time. These sets, although new, have already drawn a lot of attention (see for example [14], [28], [29], [30]). …”
Section: Data-setsmentioning
confidence: 99%
“…Our descriptor does this by coarsely quantifying appearance and motion inside this region. This is in contrast to other approaches in single-action recognition [7,11,12,15,22], where features are estimated in the whole frame or video and then clustered to localise where the action is happening. Another advantage for implementing a personcentred descriptor is that, depending on the camera angle, both persons are not always visible in a given frame, and we would like to be able to provide a classification in these instances.…”
Section: Modeling Human Activitymentioning
confidence: 98%
“…Previous work in two-person interaction recognition is scarce compared to closely related areas such as single-person action recognition [7,10,12,22], group action recognition [14,24] and human-object interaction recognition [16,23]. Closer to our work are [4,17,19], where interactions are generally recognised in a hierarchical manner putting special attention on higher level descriptions and using very constrained data.…”
Section: Introductionmentioning
confidence: 96%
See 1 more Smart Citation
“…We use local features because they are better at handling occlusions and appearance changes [13] thus more suitable for crowded environments. The different categories of local features includes volumetric descriptors computed after 3D interest point detection [12,14,16] which encode information in local spatio-temporal blocks, trajectory descriptors [11,15,17] which tracks spatial (2D) interest points over time, and flow descriptors [1,4] which use dense optical flow information. In this paper we use the trajectory based descriptor similar to that of Sun et al [15], which has been shown to perform well in complex action recognition datasets such as the Hollywood dataset [9].…”
Section: Action Representationmentioning
confidence: 99%