Figure 1: Our system enables the real-time capture of general shapes undergoing non-rigid deformations using a single depth camera. Top left: the object to be captured is scanned while undergoing rigid deformations, creating a base template. Bottom left: the object is manipulated and our method deforms the template to track the object. Top and middle row: we show our reconstruction for upper body, face, and hand sequences being captured in different poses as they are deformed. Bottom row: we show corresponding color and depth data for the reconstructed mesh in the middle row.
AbstractWe present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A new stereo matching algorithm estimates real-time RGB-D data. We start by scanning a smooth template model of the subject as they move rigidly. This geometric surface prior avoids strong scene assumptions, such as a kinematic human skeleton or a parametric shape model. Next, a novel GPU pipeline performs non-rigid registration of live RGB-D data to the smooth template using an extended non-linear as-rigid-as-possible (ARAP) framework. High-frequency details are fused onto the final mesh using a linear deformation model. The system is an order of magnitude faster than state-of-the-art methods, while matching the quality and robustness of many offline algorithms. We show precise real-time reconstructions of diverse scenes, including: large deformations of users' heads, hands, and upper bodies; fine-scale wrinkles and folds of skin and clothing; and non-rigid interactions performed by users on flexible objects such as toys. We demonstrate how acquired models can be used for many interactive scenarios, including re-texturing, online performance capture and preview, and real-time shape and motion re-targeting.
Repetitive and ambiguous visual structures in general pose a severe problem in many computer vision applications. Identification of incorrect geometric relations between images solely based on low level features is not always possible, and a more global reasoning approach about the consistency of the estimated relations is required. We propose to utilize the typically observed redundancy in the hypothesized relations for such reasoning, and focus on the graph structure induced by those relations. Chaining the (reversible) transformations over cycles in this graph allows to build suitable statistics for identifying inconsistent loops in the graph. This data provides indirect evidence for conflicting visual relations. Inferring the set of likely false positive geometric relations from these non-local observations is formulated in a Bayesian framework. We demonstrate the utility of the proposed method in several applications, most prominently the computation of structure and motion from images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.