Abstract-This paper reviews the state-of-the art in the field of lock-in ToF cameras, their advantages, their limitations, the existing calibration methods, and the way they are being used, sometimes in combination with other sensors. Even though lockin ToF cameras provide neither higher resolution nor larger ambiguity-free range compared to other range map estimation systems, advantages such as registered depth and intensity data at a high frame rate, compact design, low weight and reduced power consumption have motivated their increasing usage in several research areas, such as computer graphics, machine vision and robotics.
Markerless 3D human pose detection from a single image is a severely underconstrained problem because different 3D poses can have similar image projections. In order to handle this ambiguity, current approaches rely on prior shape models that can only be correctly adjusted if 2D image features are accurately detected. Unfortunately, although current 2D part detector algorithms have shown promising results, they are not yet accurate enough to guarantee a complete disambiguation of the 3D inferred shape.\ud In this paper, we introduce a novel approach for estimating 3D human pose even when observations are noisy. We propose a stochastic sampling strategy to propagate\ud the noise from the image plane to the shape space. This provides a set of ambiguous 3D shapes, which are virtually undistinguishable from their image projections. Disambiguation is then achieved by imposing kinematic constraints that guarantee the resulting pose resembles a 3D\ud human shape. We validate the method on a variety of situations in which state-of-the-art 2D detectors yield either inaccurate estimations or partly miss some of the body parts.Preprin
Figure 1: GanHand predicts hand shape and pose for grasping multiple objects given a single RGB image. The figure shows sample results on the YCB-Affordance dataset we propose, the largest dataset of human grasp affordances in real scenes.
The problem of predicting human motion given a sequence of past observations is at the core of many applications in robotics and computer vision. Current state-ofthe-art formulate this problem as a sequence-to-sequence task, in which a historical of 3D skeletons feeds a Recurrent Neural Network (RNN) that predicts future movements, typically in the order of 1 to 2 seconds. However, one aspect that has been obviated so far, is the fact that human motion is inherently driven by interactions with objects and/or other humans in the environment. In this paper, we explore this scenario using a novel context-aware motion prediction architecture. We use a semantic-graph model where the nodes parameterize the human and objects in the scene and the edges their mutual interactions. These interactions are iteratively learned through a graph attention layer, fed with the past observations, which now include both object and human body motions. Once this semantic graph is learned, we inject it to a standard RNN to predict future movements of the human/s and object/s. We consider two variants of our architecture, either freezing the contextual interactions in the future of updating them. A thorough evaluation in the "Whole-Body Human Motion Database" [29] shows that in both cases, our context-aware networks clearly outperform baselines in which the context information is not considered.
Detecting grasping points is a key problem in cloth manipulation. Most current approaches follow a multiple regrasp\ud strategy for this purpose, in which clothes are sequentially grasped from different points until one of them yields to a\ud desired configuration. In this paper, by contrast, we circumvent the need for multiple re-graspings by building a robust detector that identifies the grasping points, generally in one single step,\ud even when clothes are highly wrinkled.\ud In order to handle the large variability a deformed cloth may have, we build a Bag of Features based detector that combines\ud appearance and 3D geometry features. An image is scanned using a sliding window with a linear classifier, and the candidate\ud windows are refined using a non-linear SVM and a “grasp goodness” criterion to select the best grasping point.\ud We demonstrate our approach detecting collars in deformed polo shirts, using a Kinect camera. Experimental results show\ud a good performance of the proposed method not only in identifying the same trained textile object part under severe deformations and occlusions, but also the corresponding part in other clothes, exhibiting a degree of generalization.Preprin
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.