2018
DOI: 10.1109/tpami.2017.2679054
|View full text |Cite
|
Sign up to set email alerts
|

Watch-n-Patch: Unsupervised Learning of Actions and Relations

Abstract: There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-words and object-words. An activity is about a set of action-topics and object-topics indicating which actions are pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 59 publications
0
22
0
Order By: Relevance
“…The recording is not long-term as their research focus is on analysing multiple individuals performing the same task. The morning dataset from [18] and Watch-n-Patch dataset [19] both contain no repetitive activities per person, and it is mostly scripted, which do not meet the requirement for routine discovery.…”
Section: Datasets and Resultsmentioning
confidence: 99%
“…The recording is not long-term as their research focus is on analysing multiple individuals performing the same task. The morning dataset from [18] and Watch-n-Patch dataset [19] both contain no repetitive activities per person, and it is mostly scripted, which do not meet the requirement for routine discovery.…”
Section: Datasets and Resultsmentioning
confidence: 99%
“…LDA models the probability that a video (document) is generated from a set of actions (topics) where each action (topic) is a distribution of words from a codebook. They are employed for activity recognition [37] and forgotten action detection [38].…”
Section: B Stochastic Approachesmentioning
confidence: 99%
“…We quantitatively compare our hierarchical LSTM model (H-LSTM) with state-of-the-art approaches on the Watchn-Patch dataset. To the best of our knowledge the existing methods evaluated on this dataset are: Hidden Markov Model (HMM) [3], Latent Dirichlet Allocation (LDA) [4], Causal Topic Model (CaTM) [37] and Watch-Bot Topic Model (WBTM) [38]. All results are reported from [38].…”
Section: Comparison With State-of-the-artmentioning
confidence: 99%
See 1 more Smart Citation
“…Robots which act and perceive in a human environment must rely on embedded sensors of different modalities in order to perceive and recognize human activities, and in order to provide help if necessary. Vision is often used as the principal modality in performing such recognition tasks [12,17] but audio can contribute and help in these challenging situations by providing additional information. Moreover, a robot does not always have a clear line of sight, and could thus rely on audio to help understand what the human is currently doing.…”
Section: Introductionmentioning
confidence: 99%