Domain-Adaptive Discriminative One-Shot Learning of Gestures

Pfister, Tomas; Charles, James; Zisserman, Andrew

doi:10.1007/978-3-319-10599-4_52

Cited by 69 publications

(56 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We follow the experimental protocol adopted in [3,14,22,25] and provide precision, recall and F1-score measures on the validation set. We compare our model with Yao et al [25], Wu et al [22], Pfister et al [14], and Fernando et al [3].…”

Section: Methodsmentioning

confidence: 99%

Skeleton based action recognition with convolutional neural network

Wang

2015

2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)

359

244

View full text Add to dashboard Cite

Temporal dynamics of postures over time is crucial for sequence-based action recognition. Human actions can be represented by the corresponding motions of articulated skeleton. Most of the existing approaches for skeleton based action recognition model the spatial-temporal evolution of actions based on hand-crafted features. As a kind of hierarchically adaptive filter banks, Convolutional Neural Network (CNN) performs well in representation learning. In this paper, we propose an end-to-end hierarchical architecture for skeleton based action recognition with CNN. Firstly, we represent a skeleton sequence as a matrix by concatenating the joint coordinates in each instant and arranging those vector representations in a chronological order. Then the matrix is quantified into an image and normalized to handle the variable-length problem. The final image is fed into a CNN model for feature extraction and recognition. For the specific structure of such images, the simple max-pooling plays an important role on spatial feature selection as well as temporal frequency adjustment, which can obtain more discriminative joint information for different actions and meanwhile address the variable-frequency problem. Experimental results demonstrate that our method achieves the state-of-art performance with high computational efficiency, especially surpassing the existing result by more than 15 percentage on the challenging ChaLearn gesture recognition dataset.

show abstract

Section: Methodsmentioning

confidence: 99%

Skeleton based action recognition with convolutional neural network

Wang

2015

2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)

359

244

View full text Add to dashboard Cite

show abstract

“…Precision Recall F-score Pfister et al [17] 61.2% 62.3% 61.7% Yao et al [28] --56.0% Wu et al [26] 59.9% 59.3% 59.6% VideoDarwin [5] 74.0% 73.8% 73.9% HiVideoDarwin 74.9% 75.6% 74.6% Table 3. Statistical analysis for parameters.…”

Section: Approachmentioning

confidence: 95%

“…For each frame we estimate the body joints using [19] to preprocess these data and extract frame descriptors in the same way as [5]. We report precision, recall, F1-score and mAP on the validation set, as done in [17,28].…”

Section: Datasetsmentioning

confidence: 99%

Hierarchical motion evolution for action recognition

Wang

2015

2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)

View full text Add to dashboard Cite

Human action can be decomposed into a series of temporally correlated motions. Since the traditional bag-of-wordsframework based on local features cannot model global motion evolution of actions, models like Recurrent Neural Network (RNN) [15] and VideoDarwin [5] are accordingly explored to capture video-wise temporal information. Inspired by VideoDarwin, in this paper, we present a novel hierarchical scheme to learn better video representation, called HiVideoDarwin. Specifically, we first use different ranking machines to learn motion descriptors of local video clips. Then, in order to model motion evolution, we encode features obtained in previous layer again using a ranking machine. Compared with VideoDarwin, HiVideoDarwin captures the global and high-level video representation and is robust to large appearance changes. Compared with RN-N, HiVideoDarwin can also abstract semantic information in a hierarchical way and is fast to compute and easy to interpret. We evaluate the proposed method on two datasets, namely MPII Cooking and Chalearn. Experimental results show that HiVideoDarwin has distinct advantages over the state-of-the-art models. Additional sensitivity analysis reveals that the overall results are hardly affected by parameter changes.

show abstract

“…Subtitles have been exploited for assisting the learning of visual recognizer. Several studies [6,8,31] automatically learn British Sign Language signs from TV broadcasts. Their videos contain a single signer with a stable pose.…”

Section: Related Workmentioning

confidence: 99%

Discover and Learn New Objects from Documentaries

Chen

Song

Loy

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task. Detectors learned from a public dataset can only work with a fixed list of categories, while training from scratch usually requires a large amount of training data with detailed annotations. This work aims to explore a novel approach -learning object detectors from documentary films in a weakly supervised manner. This is inspired by the observation that documentaries often provide dedicated exposition of certain object categories, where visual presentations are aligned with subtitles. We believe that object detectors can be learned from such a rich source of information. Towards this goal, we develop a joint probabilistic framework, where individual pieces of information, including video frames and subtitles, are brought together via both visual and linguistic links. On top of this formulation, we further derive a weakly supervised learning algorithm, where object model learning and training set mining are unified in an optimization procedure. Experimental results on a real world dataset demonstrate that this is an effective approach to learning new object detectors.

show abstract

Domain-Adaptive Discriminative One-Shot Learning of Gestures

Cited by 69 publications

References 25 publications

Skeleton based action recognition with convolutional neural network

Skeleton based action recognition with convolutional neural network

Hierarchical motion evolution for action recognition

Discover and Learn New Objects from Documentaries

Contact Info

Product

Resources

About