The 3D human-like body reconstruction via a single RGB image has attracted significant research attention recently. Most of the existing methods rely on the Skinned Multi-Person Linear Model (SMPL), and thus can only predict unified human bodies. Moreover, meshes reconstructed by current methods sometimes perform well from a canonical view but not from other views, as the reconstruction process is commonly supervised by only a single view. To address these limitations, this paper proposes a multi-view shape generation network for 3D human-like body. Particularly, we propose a coarse-to-fine learning model which gradually deforms a template body towards the ground truth body. Our model utilizes the information of multi-view renderings and corresponding 3D vertex transformation as supervision. Such supervision will help to generate 3D bodies well aligned to all views. To accurately operate mesh deformation, a Graph Convolutional Network (GCN) structure is introduced to support the shape generation from 3D vertex representation. Additionally, a graph up-pooling operation is designed over the intermediate representations of GCN, and thus our model can generate 3D shapes with higher resolution. Novel loss functions are employed to help optimize the whole multi-view generation model, resulting in smoother surfaces. Besides, two multi-view human body datasets are produced and contributed to the community. And extensive experiments conducted on the benchmark datasets demonstrate the efficacy of our model over the competitors.
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image, without external pose-annotated real-world training data. While previous works [43,39,4] exploit visual cues in RGB(D) images, our method makes inferences based on the rich geometric information of the object in the depth channel alone. Essentially, our framework explores such geometric information by learning the unified 3D Orientation-Consistent Representations (3D-OCR) module, and further enforced by the property of Geometry-constrained Reflection Symmetry (GeoReS) module. The magnitude information of object size and the center point is finally estimated by Mirror-Paired Dimensional Estimation (MPDE) module. Extensive experiments on the category-level NOCS benchmark demonstrate that our framework competes with state-of-the-art approaches that require labeled real-world images. We also deploy our approach to a physical Baxter robot to perform manipulation tasks on unseen but category-known instances, and the results further validate the efficacy of our proposed model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.