Graph-based approach for human action recognition using spatio-temporal features

Aoun, Najib Ben; Mejdoub, Mahmoud; Amar, Chokri Ben

doi:10.1016/j.jvcir.2013.11.003

Cited by 55 publications

(7 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Having no previous knowledge about the location of the person in each video frame, the human action in a video stream can be recovered from a great number of local descriptors extracted from the video frames (Sekma et al, 2013), (Dammak et al, 2012), , (Sekma et al, 2014). Local descriptors, coupled with the bag-of-words (BOW) encoding method (Sivic and Zisserman, 2003) (Mejdoub et al, 2008) (Mejdoub et al, 2007) have recently become a very popular video representation (Ben Aoun et al, 2014), (Knopp et al, 2010), (Laptev et al, 2008), (Wang et al, 2009), (Alexander et al, 2008), (Wang et al, 2011), (Raptis and Soatto, 2010), (Pyry et al, 2010), (Jiang et al, 2012) and (Jain et al, 2013). The BOW uses a codebook to create a representation based on the visual content of a video, where the codebook is a set of visual words that represents the distribution of features of all the video.…”

Section: Intoductionmentioning

confidence: 99%

Human action recognition based on multi-layer Fisher vector encoding method

Sekma

Mejdoub

Amar

2015

Pattern Recognition Letters

Self Cite

View full text Add to dashboard Cite

Section: Intoductionmentioning

confidence: 99%

Human action recognition based on multi-layer Fisher vector encoding method

Sekma

Mejdoub

Amar

2015

Pattern Recognition Letters

Self Cite

View full text Add to dashboard Cite

“…It aims to group image pixels into semantically meaningful regions. It has been used for many applications such as video action and event recognition [Wal10a,Ben11a,Ben14a,Ben14b,Mej15a], image search engines [Wan14a,Ben10a], augmented reality [Alh17a], image and video coding [Ben11b,Ben12a], Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.…”

Section: Introductionmentioning

confidence: 99%

Multiscale Fully Convolutional DenseNet for Semantic Segmentation

Brahimi¹,

Aoun²,

Amar³

et al. 2018

JWSCG

Self Cite

View full text Add to dashboard Cite

In the computer vision field, semantic segmentation represents a very interesting task. Convolutional Neural Network methods have shown their great performances in comparison with other semantic segmentation methods. In this paper, we propose a multiscale fully convolutional DenseNet approach for semantic segmentation. Our approach is based on the successful fully convolutional DenseNet method. It is reinforced by integrating a multiscale kernel prediction after the last dense block which performs model averaging over different spatial scales and provides more flexibility of our network to presume more information. Experiments on two semantic segmentation benchmarks: CamVid and Cityscapes have shown the effectiveness of our approach which has outperformed many recent works.

show abstract

“…Indeed, actual person re-ID works are divided according to two main categories: shallow and deep methods. Shallow methods are specifically based on the appearance hand-crafted features [1,2,3,4,5,6,7,12,13,14,15]. In this context, two types of features are distinguished: low-level as well as mid-level ones.…”

Section: Introductionmentioning

confidence: 99%

Person re-ID while Crossing Different Cameras: Combination of Salient-Gaussian Weighted BossaNova and Fisher Vector Encodings

Mejdoub¹,

Ksibi²,

Amar³

et al. 2017

ijacsa

Self Cite

View full text Add to dashboard Cite

Abstract-Person re-identification (re-ID) is a challenging task in the camera surveillance field, since it addresses the problem of re-identifying people across multiple non-overlapping cameras. Most of existing approaches have been concentrated on: 1) achieving a robust and effective feature representation; and 2) enforcing discriminative metric learning to predict if two images represent the same identity. In this context, we present a new approach for person re-ID built upon multi-level descriptors. This is achieved by combining three complementary representations: salient-Gaussian Fisher Vector (SGFV) encoding method, salient-Gaussian BossaNova (SGBN) histogram encoding method and deep Convolutional Neural Network (CNN) features. The two first methods adapt the histogram encoding framework to the person re-ID task. This is achieved by integrating the pedestrian saliency map and the spatial location information, in the histogram encoding process. On one hand, human saliency is reliable and distinctive in the person re-ID task, since it can model the uniqueness of the identity. On the other hand, localizing a person in the image can effectively discard noisy background information. Finally, one of the most advanced metric learning in person re-ID: the Cross-view Quadratic Discriminant Analysis (XQDA) is applied on the top of the resulting description. The proposed method yields promising person re-ID results on two challenging image-based person re-ID benchmarks: CUHK03 and Market-1501.

show abstract

Graph-based approach for human action recognition using spatio-temporal features

Cited by 55 publications

References 28 publications

Human action recognition based on multi-layer Fisher vector encoding method

Human action recognition based on multi-layer Fisher vector encoding method

Multiscale Fully Convolutional DenseNet for Semantic Segmentation

Person re-ID while Crossing Different Cameras: Combination of Salient-Gaussian Weighted BossaNova and Fisher Vector Encodings

Contact Info

Product

Resources

About