Selective spatio-temporal interest points

Reardon

2015 IEEE International Conference on Robotics and Automation (ICRA)

et al. 2015

Abstract-Activity recognition of multi-individuals (ARMI) within a group, which is essential to practical human-centered robotics applications such as childhood education, is a particularly challenging and previously not well studied problem. We present a novel adaptive human-centered (AdHuC) representation based on local spatio-temporal features (LST) to address ARMI in a sequence of 3D point clouds. Our human-centered detector constructs affiliation regions to associate LST features with humans by mining depth data and using a cascade of rejectors to localize humans in 3D space. Then, features are detected within each affiliation region, which avoids extracting irrelevant features from dynamic background clutter and addresses moving cameras on mobile robots. Our feature descriptor is able to adapt its support region to linear perspective view variations and encode multi-channel information (i.e., color and depth) to construct the final representation. Empirical studies validate that the AdHuC representation obtains promising performance on ARMI using an Meka humanoid robot to play multi-people Simon Says games. Experiments on benchmark datasets further demonstrate that our adaptive human-centered representation outperforms previous approaches for activity recognition from color-depth data.

Section: Introductionmentioning

confidence: 99%

Section: B Local Spatio-temporal Featuresmentioning

confidence: 99%

Adaptive human-centered representation for activity recognition of multiple individuals from 3D point cloud sequences

Reardon

2015 IEEE International Conference on Robotics and Automation (ICRA)

et al. 2015

Computer Vision and Image Understanding

“…Our paper describes a representation scheme akin to space time interest point (STIP) models [1], on which several other approaches build upon [20,21]. Interestingly almost all of these methods [1,22,23] have employed a local spatio-temporal scale-space extrema detection approach.…”

Section: Related Workmentioning

confidence: 99%

Action recognition using global spatio-temporal features derived from sparse representations

Somasundaram

Cherian

Morellas

et al. 2014

Recognizing actions is one of the important challenges in computer vision with respect to video data, with applications to surveillance, diagnostics of mental disorders, and video retrieval. Compared to other data modalities such as documents and images, processing video data demands orders of magnitude higher computational and storage resources. One way to alleviate this difficulty is to focus the computations to informative (salient) regions of the video. In this paper, we propose a novel global spatio-temporal selfsimilarity measure to score saliency using the ideas of dictionary learning and sparse coding. In contrast to existing methods that use local spatio-temporal feature detectors along with descriptors (such as HOG, HOG3D, HOF, etc.), dictionary learning helps consider the saliency in a global setting (on the entire video) in a computationally efficient way. We consider only a small percentage of the most salient (least self-similar) regions found using our algorithm, over which spatio-temporal descriptors such as HOG and region covariance descriptors are computed. The ensemble of such block descriptors in a bag-of-features framework provides a holistic description of the motion sequence which can be used in a classification setting. Experiments on several benchmark datasets in video based action classification demonstrate that our approach performs competitively to the state of the art.

“…To begin, we will start developing our theory for spatiotemporal scale selection with respect to the problem of detecting sparse spatio-temporal interest points [6,9,11,14,18,20,21,30,49,88,94,97,99,100,107,122,124,126,127], which may be regarded as a conceptually simplest problem domain because of the sparsity of spatio-temporal interest points and the close connection between this problem domain and the detection of spatial interest points for which there exists a theoretically well-founded and empirically tested framework regarding scale selection over the spatial domain [1,4,5,15,17,25,42,65,72,74,84,89,90,112]. Specifically, using a non-causal Gaussian spatio-temporal scale-space model, we will perform a theoretical analysis of the spatio-temporal scale selection properties of eight different types of spatiotemporal interest point detectors and show that seven of them: (i) the spatial Laplacian of the first-order temporal derivative, (ii) the spatial Laplacian of the second-order temporal derivative, (iii) the determinant of the spatial Hessian of the first-order temporal derivative, (iv) the determinant of the spatial Hessian of the second-order temporal derivative, (v) the determinant of the spatio-temporal Hessian matrix, (vi) the first-order temporal derivative of the determinant of the spatial Hessian matrix and (vii) the second-order temporal derivative of the determinant of the spatial Hessian matrix, do all lead to fully scale-covariant spatio-temporal scale estimates and scale-invariant feature responses under independent scaling transformations of the spatial and the temporal domains.…”

Section: Fig 4 the First-and Second-order Temporal Derivatives Of Thmentioning

confidence: 99%

“…Let us approximate the spatial smoothing operation in the continuous spatio-temporal scale-space representation according to (9) by smoothing with the discrete analogue of the Gaussian kernel over the spatial domain [56] T…”

Section: Time-causal and Time-recursive Algorithm For Spatio-temporalmentioning

confidence: 99%

Spatio-Temporal Scale Selection in Video Data

Lindeberg

2017

J Math Imaging Vis

This work presents a theory and methodology for simultaneous detection of local spatial and temporal scales in video data. The underlying idea is that if we process video data by spatio-temporal receptive fields at multiple spatial and temporal scales, we would like to generate hypotheses about the spatial extent and the temporal duration of the underlying spatio-temporal image structures that gave rise to the feature responses. For two types of spatio-temporal scale-space representations, (i) a non-causal Gaussian spatio-temporal scale space for offline analysis of pre-recorded video sequences and (ii) a time-causal and timerecursive spatio-temporal scale space for online analysis of real-time video streams, we express sufficient conditions for spatio-temporal feature detectors in terms of spatio-temporal receptive fields to deliver scale-covariant and scale-invariant feature responses. We present an in-depth theoretical analysis of the scale selection properties of eight types of spatio-temporal interest point detectors in terms of either: (i)-(ii) the spatial Laplacian applied to the first-and secondorder temporal derivatives, (iii)-(iv) the determinant of the spatial Hessian applied to the first-and second-order temporal derivatives, (v) the determinant of the spatio-temporal Hessian matrix, (vi) the spatio-temporal Laplacian and (vii)-(viii) the first-and second-order temporal derivatives of the determinant of the spatial Hessian matrix. It is shown that seven of these spatio-temporal feature detectors allow for provable scale covariance and scale invariance. Then, we describe a time-causal and time-recursive algorithm for detecting sparse spatio-temporal interest points from video streams and show that it leads to intuitively reasonable results. An experimental quantification of the accuracy of the spatio-temporal scale estimates and the amount of temporal delay obtained from these spatio-temporal interest point detectors is given, showing that: (i) the spatial and temporal scale selection properties predicted by the continuous theory are well preserved in the discrete implementation and (ii) the spatial Laplacian or the determinant of the spatial Hessian applied to the first-and second-order temporal derivatives leads to much shorter temporal delays in a timecausal implementation compared to the determinant of the spatio-temporal Hessian or the first-and second-order temporal derivatives of the determinant of the spatial Hessian matrix.