2012
DOI: 10.1007/978-3-642-33863-2_33
|View full text |Cite
|
Sign up to set email alerts
|

Action Recognition Robust to Background Clutter by Using Stereo Vision

Abstract: Abstract. An action recognition algorithm which works with binocular videos is presented. The proposed method uses standard bag-of-words approach, where each action clip is represented as a histogram of visual words. However, instead of using classical monocular HoG/HoF features, we construct features from the scene-flow computed by a matching algorithm on the sequence of stereo images. The resulting algorithm has a comparable or slightly better recognition accuracy than standard monocular solution in controll… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
11
0
3

Year Published

2012
2012
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 10 publications
(17 citation statements)
references
References 22 publications
3
11
0
3
Order By: Relevance
“…This intuition explains the gap in classification rate between methods E r and D r : with E r no such filtering takes place and the modest gains in mean average precision, in comparison to the monocular approach, may simply be attributed to the more dense video description, since E r i = C d i ∪ C r i . It also confirms the conclusions reached in [28], regarding the use of stereoscopic data to exploit video background-foreground segmentation for activity recognition. However, contrary to [28], the proposed method D r operates along these lines only implicitly, through increasing texture invariance and scene geometry content of the video description, as well as in a generic manner, not associated with any specific feature descriptor.…”
Section: A Experimental Results For the Description Stagesupporting
confidence: 88%
See 3 more Smart Citations
“…This intuition explains the gap in classification rate between methods E r and D r : with E r no such filtering takes place and the modest gains in mean average precision, in comparison to the monocular approach, may simply be attributed to the more dense video description, since E r i = C d i ∪ C r i . It also confirms the conclusions reached in [28], regarding the use of stereoscopic data to exploit video background-foreground segmentation for activity recognition. However, contrary to [28], the proposed method D r operates along these lines only implicitly, through increasing texture invariance and scene geometry content of the video description, as well as in a generic manner, not associated with any specific feature descriptor.…”
Section: A Experimental Results For the Description Stagesupporting
confidence: 88%
“…It also confirms the conclusions reached in [28], regarding the use of stereoscopic data to exploit video background-foreground segmentation for activity recognition. However, contrary to [28], the proposed method D r operates along these lines only implicitly, through increasing texture invariance and scene geometry content of the video description, as well as in a generic manner, not associated with any specific feature descriptor.…”
Section: A Experimental Results For the Description Stagesupporting
confidence: 88%
See 2 more Smart Citations
“…The first scheme is based on a bagof-words (BoW) approach while the second one incorporates temporal structure by means of HMMs. Scene-flow [15] and STIPs [8] are used as visual features and MFCC as auditory features. The fusion is done at the classification stage through a modality-weighting scheme, i.e., pooling.…”
Section: Related Work and Contributionsmentioning
confidence: 99%