Human Action Recognition Using Temporal Segmentation and Accordion Representation

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2014

Self Cite

We propose in this paper a spatio-temporal pyramid representation (STPR) of the video based Accordion image. The Accordion image allows the pixels having a high temporal correlation to be put in space adjacency. The STPR introduces spatial and temporal layout information to the local SIFT features computed on the Accordion image. It consists in applying firstly, a temporal pyramid decomposition on the video to divide it into a sequence of increasingly finer temporal blocks and secondly in performing a spatial pyramid representation on the Accordion images relative to the temporal blocks. The Multiple Kernel Learning approach is used to combine the multi-histograms coming from different SpatioTemporal Pyramid levels. Experiments using the human action recognition datasets (Hollywood2 and Olympic sports) show the effectiveness of the proposed approach.

show abstract

“…In this work, our motion descriptor achieves 57.5%. It outperforms the approaches proposed in [25,26,11,14] and gives similar results with the MBH+STP descriptor [6].…”

Section: Hollywood2 Resultsmentioning

confidence: 58%

Section: Hollywood2 Resultsmentioning

confidence: 72%

Section: Hollywood2 Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Spatio-temporal pyramidal accordion representation for human action recognition

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2014

Self Cite

show abstract

“…Having no previous knowledge about the location of the person in each video frame, the human action in a video stream can be recovered from a great number of local descriptors extracted from the video frames (Sekma et al, 2013), (Dammak et al, 2012), , (Sekma et al, 2014). Local descriptors, coupled with the bag-of-words (BOW) encoding method (Sivic and Zisserman, 2003) (Mejdoub et al, 2008) (Mejdoub et al, 2007) have recently become a very popular video representation (Ben Aoun et al, 2014), (Knopp et al, 2010), (Laptev et al, 2008), (Wang et al, 2009), (Alexander et al, 2008), (Wang et al, 2011), (Raptis and Soatto, 2010), (Pyry et al, 2010), (Jiang et al, 2012) and (Jain et al, 2013).…”

Section: Intoductionmentioning

confidence: 99%

Human action recognition based on multi-layer Fisher vector encoding method

Pattern Recognition Letters

2015

Self Cite

Bag of Graphs with Geometric Relationships Among Trajectories for Better Human Action Recognition

Image Analysis and Processing — ICIAP 2015

2015

Self Cite