A Spatio-Temporal Descriptor Based on 3D-Gradients

Kläser, Alexander; Marszałek, Marcin; Schmid, Cordelia

doi:10.5244/c.22.99

Cited by 1,610 publications

(1,190 citation statements)

References 23 publications

Supporting

Mentioning

1,177

Contrasting

Unclassified

Order By: Relevance

“…Visual Feature: For all experiments HOG3D features [2], k-means quantized into a 1000-word codebook are used. For all techniques that require visual features, the approximated Histogram Intersection Kernel via feature extension [22] is used to provide higher quality results.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Discovering Video Clusters from Visual Features and Noisy Tags

Vahdat

Zhou

Mori

2014

Computer Vision – ECCV 2014

View full text Add to dashboard Cite

Abstract. We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.

show abstract

Section: Methodsmentioning

confidence: 99%

“…HOG3D [2]) from video appearance, and then apply a standard clustering algorithm. For instance, Wang et al [3] cluster images strictly based on appearance, and Niebles et al [4] develop topic models based on video bag-of-words approaches.…”

Section: Introductionmentioning

confidence: 99%

Discovering Video Clusters from Visual Features and Noisy Tags

Vahdat

Zhou

Mori

2014

Computer Vision – ECCV 2014

View full text Add to dashboard Cite

show abstract

“…Precisely, we use the vertex points generated in a triangular tessellation to obtain a quasi-regular distribution of the orientation bins (see Figure 5b). One alternative yielding completely regular bins is the approach of Klaser et al [34], where points in the sphere surface are projected onto a platonic solid; however, it has a limitation on the number of bins, since the platonic solid with more facets available is the icosahedron (20-sided).…”

Section: Orientation Assignmentmentioning

confidence: 99%

A 3D descriptor to detect task-oriented grasping points in clothing

Ramisa

Alenyà

Moreno-Noguer

et al. 2016

Pattern Recognition

View full text Add to dashboard Cite

Manipulating textile objects with a robot is a challenging task, especially because the garment perception is difficult due to the endless configurations it can adopt, coupled with a large variety of colors and designs. Most current approaches follow a multiple re-grasp strategy, in which clothes are sequentially grasped from different points until one of them yields a recognizable configuration. In this work we propose a method that combines 3D and appearance information to directly select a suitable grasping point for the task at hand, which in our case consists of hanging a shirt or a polo shirt from a hook. Our method follows a coarse-to-fine approach in which, first, the collar of the garment is detected and, next, a grasping point on the lapel is chosen using a novel 3D descriptor.In contrast to current 3D descriptors, ours can run in real time, even when it needs to be densely computed over the input image. Our central idea is to take advantage of the structured nature of range images that most depth sensors provide and, by exploiting integral imaging, achieve speed-ups of two orders of magnitude with respect to competing approaches, while maintaining performance. This makes it especially adequate for robotic applications as we thoroughly demonstrate in the experimental section.

show abstract

“…Based on the successful development of video features, e.g., STIP [1], cuboids [13], and 3D HoG [22], many human activity recognition methods have been developed. Previously, [15] The left figure illustrates our implicit spatial-temporal shape model on a training video.…”

Section: Related Workmentioning

confidence: 99%

Propagative Hough Voting for Human Activity Recognition

Yuan

Liu

2012

Computer Vision – ECCV 2012

View full text Add to dashboard Cite

Abstract. Hough-transform based voting has been successfully applied to both object and activity detections. However, most current Hough voting methods will suffer when insufficient training data is provided. To address this problem, we propose propagative Hough voting for activity analysis. Instead of letting local features vote individually, we perform feature voting using random projection trees (RPT) which leverages the low-dimension manifold structure to match feature points in the highdimensional feature space. Our RPT can index the unlabeled testing data in an unsupervised way. After the trees are constructed, the label and spatial-temporal configuration information are propagated from the training samples to the testing data via RPT. The proposed activity recognition method does not rely on human detection and tracking, and can well handle the scale and intra-class variations of the activity patterns. The superior performances on two benchmarked activity datasets validate that our method outperforms the state-of-the-art techniques not only when there is sufficient training data such as in activity recognition, but also when there is limited training data such as in activity search with one query example.

show abstract

A Spatio-Temporal Descriptor Based on 3D-Gradients

Cited by 1,610 publications

References 23 publications

Discovering Video Clusters from Visual Features and Noisy Tags

Discovering Video Clusters from Visual Features and Noisy Tags

A 3D descriptor to detect task-oriented grasping points in clothing

Propagative Hough Voting for Human Activity Recognition

Contact Info

Product

Resources

About