2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015
DOI: 10.1109/cvpr.2015.7299065
|View full text |Cite
|
Sign up to set email alerts
|

Watch-n-patch: Unsupervised understanding of actions and relations

Abstract: We focus on modeling human activities comprising multiple actions in a completely unsupervised setting. Our model learns the high-level action co-occurrence and temporal relations between the actions in the activity video. We consider the video as a sequence of short-term action clips, called action-words, and an activity is about a set of action-topics indicating which actions are present in the video. Then we propose a new probabilistic model relating the action-words and the action-topics. It allows us to m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
117
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(118 citation statements)
references
References 38 publications
0
117
0
Order By: Relevance
“…For fair evaluations and comparisons, we evaluate the proposed algorithm on three types of datasets: (i) Real data with full annotation on PiGraphs dataset [34] with limited 3D scenes. (ii) Real data with partial annotation on daily activity dataset Watch-n-Patch [47], which only contains ground-truth depth information and annotations of 3D human poses. (iii) Synthetic data with generated annotations to serve as the ground truth: we sample 3D human poses of various activities in SUN RGB-D dataset [38] and project the sampled skeletons back onto the 2D image plane.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…For fair evaluations and comparisons, we evaluate the proposed algorithm on three types of datasets: (i) Real data with full annotation on PiGraphs dataset [34] with limited 3D scenes. (ii) Real data with partial annotation on daily activity dataset Watch-n-Patch [47], which only contains ground-truth depth information and annotations of 3D human poses. (iii) Synthetic data with generated annotations to serve as the ground truth: we sample 3D human poses of various activities in SUN RGB-D dataset [38] and project the sampled skeletons back onto the 2D image plane.…”
Section: Methodsmentioning
confidence: 99%
“…Experimental results on PiGraphs [34], Watch-n-Patch [47], and SUN RGB-D [38] demonstrate that the proposed method outperforms state-of-the-art methods for both 3D scene reconstruction and 3D pose estimation. Moreover, the ablative analysis shows that the HOI prior improves the reconstruction, and the physical common sense helps to make physically plausible predictions.…”
Section: Physics Commonsensementioning
confidence: 99%
See 1 more Smart Citation
“…4. Specifically, given K human joints with [175] Vector of Joints Conc Lowlv Hand Patsadu et al [176] Vector of Joints Conc Lowlv Hand Huang and Kitani [177] Cost Topology Stat Lowlv Hand Devanne et al [178] Motion Units Conc Manif Hand Wang et al [179] Motion Poselets BoW Body Dict Wei et al [180] Structural Prediction Conc Lowlv Hand Gupta et al [181] 3D Pose w/o Body Parts Conc Lowlv Hand Amor et al [182] Skeleton's Shape Conc Manif Hand Sheikh et al [183] Action Space Conc Lowlv Hand Yilma and Shah [184] Multiview Geometry Conc Lowlv Hand Gong et al [185] Structured Time Conc Manif Hand Rahmani and Mian [186] Knowledge Transfer BoW Lowlv Dict Munsell et al [187] Motion Biometrics Stat Lowlv Hand Lillo et al [188] Composable Activities BoW Lowlv Dict Wu et al [189] Watch-n-Patch BoW Lowlv Dict Gong and Medioni [190] Dynamic Manifolds BoW Manif Dict Han et al [191] Hierarchical Manifolds BoW Manif Dict Slama et al [192,193] Grassmann Manifolds BoW Manif Dict Devanne et al [194] Riemannian Manifolds Conc Manif Hand Huang et al [195] Shape Tracking Conc Lowlv Hand Devanne et al [196] Riemannian Manifolds Conc Manif Hand Zhu et al [197] RNN with LSTM Conc Lowlv Deep Chen et al [198] EnwMi Learning BoW Lowlv Dict Hussein et al [199] Covariance of 3D Joints Stat Lowlv Hand Shahroudy et al [200] MMMP BoW Body Unsup Jung and Hong [201] Elementary Moving Pose BoW Lowlv Dict Evangelidis et al [202] Skeletal Quad Conc Lowlv Hand Azary and Savakis [203] Grassmann Manifolds Conc Manif Hand Barnachon et al [204] Hist. of Action Poses Stat Lowlv Hand Shahroudy et al [205] Feature Fusion BoW Body Unsup Cavazza et al [206] Kernelized-COV Stat Lowlv Hand …”
Section: Representations Based On Raw Joint Positionsmentioning
confidence: 99%
“…Given a new instance, this encoding methodology uses the normalized frequency vector of code occurrence as the final feature vector. Bag-of-words encoding is widely employed by a large number of skeleton-based human representations [174,143,144,210,186,109,150,127,188,189,115,152,190,192,159,147,165,167,179,162,218,219]. According to how the dictionary is learned, the encoding methods can be broadly categorized into two groups, based on clustering or sparse coding.…”
Section: Bag-of-words Encodingmentioning
confidence: 99%