Multi-feature max-margin hierarchical Bayesian model for action recognition

Yang, Shuang; Yuan, Chunfeng; Wu, Baoxin; Hu, Weiming; Wang, Fangshi

doi:10.1109/cvpr.2015.7298769

Cited by 23 publications

(10 citation statements)

References 25 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, by including label information, Bregonzio et al [2] add a non-uniform topic proportion constraint on LDA to discover class-specific topics of actions in video. Similarly, a multi-feature Bayesian model [45] is proposed on the basis of LDA to jointly learn the mix-ture distributions of sparse and dense motion attributes. In the attempt of skeleton-based motion analysis, seven pairs of geometric features of limb actions are processed and weighted to generate the text-based motion description [51] which ignores the order of motion.…”

Section: Related Workmentioning

confidence: 99%

Retrieval of spatial–temporal motion topics from 3D skeleton data

Men

Leung

2019

Vis Comput

View full text Add to dashboard Cite

Retrieval of a specific human motion from 3D skeleton data is intractable because of its articulated complexity. We propose a context-based motion document formation method to reflect geometric variations by calculating covariance descriptors among skeletal joint locations and joint relative distances, and temporal variations by performing a coarse-to-fine segmentation on the motion sequence. The descriptors of query motion traverse all the motion categories to lock its motion words, which can be regarded as the basic units of a motion document. The discrete motion words of different spatiotemporal descriptors are also mapped to divergent index ranges to add prior knowledge of motion with temporal order to latent Dirichlet allocation (LDA). The similarity matching is based on motion-topic distributions from LDA with semantic meanings. The experiments on public datasets show the effectiveness and robustness of the proposed method over existing models.

show abstract

Section: Related Workmentioning

confidence: 99%

Retrieval of spatial–temporal motion topics from 3D skeleton data

Men

Leung

2019

Vis Comput

View full text Add to dashboard Cite

show abstract

“…However, the independence assumption of different topics would lead to non smooth temporal segmentations. Recently, a multifeature max-margin hierarchical Bayesian model [60] is proposed to jointly learn a high-level representation by combining a hierarchical generative model and discriminative maxmargin classifiers in a unified Bayesian framework. Differently, our model considers both correlations and the relative time distributions between topics rather than the absolute time, which captures richer information of action structures in the complex human activity.…”

Section: Related Workmentioning

confidence: 99%

Watch-n-Patch: Unsupervised Learning of Actions and Relations

Zhang²,

Şener

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-words and object-words. An activity is about a set of action-topics and object-topics indicating which actions are present and which objects are interacting with. We then propose a new probabilistic model relating the words and the topics. It allows us to model long-range action relations that commonly exist in the composite activities, which is challenging in previous works. We apply our model to the unsupervised action segmentation and clustering, and to a novel application that detects forgotten actions, which we call action patching. For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects. Moreover, we develop a robotic system that watches and reminds people using our action patching algorithm. Our robotic setup can be easily deployed on any assistive robots.

show abstract

“…Alfaro et al [1] use a set of pooled key-sequences to quantify relative local intra-and inter-class similarities by projecting the key-sequences to a bank of dictionaries encoding patterns from different temporal positions or action classes. Yang et al [33] jointly learn a high-level representation by combining a hierarchical generative model (that represents actions by distributions over latent spatial temporal patterns) and discriminative max-margin classifiers in a unified Bayesian framework. Fernando et al [6] propose a hierarchical rank pooling -based on [8] -, but on video segments; the first layer performs rank pooling on CNN feature maps and subsequent layers on the result of previous rank pooling operations.…”

Section: Related Workmentioning

confidence: 99%

Darwintrees for Action Recognition

Clapés

Tuytelaars

Escalera

2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

View full text Add to dashboard Cite

We propose a novel mid-level representation for action/activity recognition on RGB videos. We model the evolution of improved dense trajectory features not only for the entire video sequence, but also on subparts of the video. Subparts are obtained using a spectral divisive clustering that yields an unordered binary tree decomposing the entire cloud of trajectories of a sequence. We then compute videodarwin on video subparts, exploiting more finegrained temporal information and reducing the sensitivity of the standard time varying mean strategy of videodarwin. After decomposition, we model the evolution of features through both frames of subparts and descending/ascending paths in tree branches. We refer to these mid-level representations as node-darwintree and branch-darwintree respectively. For the final classification, we construct a kernel representation for both mid-level and holistic videodarwin representations. Our approach achieves better performance than standard videodarwin and defines the current state-of-the-art on UCF-Sports and Highfive action recognition datasets.

show abstract

Multi-feature max-margin hierarchical Bayesian model for action recognition

Cited by 23 publications

References 25 publications

Retrieval of spatial–temporal motion topics from 3D skeleton data

Retrieval of spatial–temporal motion topics from 3D skeleton data

Watch-n-Patch: Unsupervised Learning of Actions and Relations

Darwintrees for Action Recognition

Contact Info

Product

Resources

About