Action recognition is one of the most important components for video analysis. In addition to objects and atomic actions, temporal relationships are important characteristics for many actions and are not fully exploited in many approaches. We model the temporal structures of midlevel actions (referred to as components) based on dense trajectory components, obtained by clustering individual trajectories. The trajectory components are a higher level and a more stable representation than raw individual trajectories. Based on the temporal ordering of trajectory components, we describe the temporal structure using Allen's temporal relationships in a discriminative manner and combine it with a generative model using bag of components. The main idea behind the model is to extract midlevel features from domain-independent dense trajectories and classify the actions by exploring the temporal structure among these midlevel features based on a set of relationships. We evaluate the proposed approach on public data sets and compare it with a bag-of-words-based approach and state-of-the-art application of the Markov logic network for action recognition. The results demonstrate that the proposed approach produces better recognition accuracy. C 2014 Wiley Periodicals, Inc.
A highly general and centralized reasoning framework which combines first-order-logic with Markov networks proposed to recognize both simple and complex activities. The generality and systematicity of the reasoning framework is characterized by a newly defined set of spatio-temporal and spatial semantic free low level event predicates(LLEs). With the new low level event predicates any human activity represented by trajectories can be described without domain knowledge thus can be applied across domains. High-level events (HLEs) of interest across different domains can be described by encoding the newly defined HLEs and temporal logic (Allen's interval logic) in a first-order-logic presentation. The main contribution is the proposed reasoning framework represented by a new set of semantics free LLEs which can be utilized across different domains. The human action Kinect dataset from Microsoft Research(MSR) is used to evaluate the proposed gesture representation and recognition framework. The capacity of performing across different domains is validated on both MSR dataset and one synthetic interation dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.