Object detection and 6D pose estimation in the crowd (scenes with multiple object instances, severe foreground occlusions and background distractors), has become an important problem in many rapidly evolving technological areas such as robotics and augmented reality. Single shotbased 6D pose estimators with manually designed features are still unable to tackle the above challenges, motivating the research towards unsupervised feature learning and next-best-view estimation. In this work, we present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we a) propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder and b) offer an extensive evaluation of various state of the art features. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets.
Abstract-The performance of automatic 3-D face recognition can be significantly improved by coping with the nonrigidity of the facial surface. In this paper, we propose a geodesic polar parameterization of the face surface. With this parameterization, the intrinsic surface attributes do not change under isometric deformations and, therefore, the proposed representation is appropriate for expression-invariant 3-D face recognition. We also consider the special case of an open mouth that violates the isometry assumption and propose a modified geodesic polar parameterization that also leads to invariant representation. Based on this parameterization, 3-D face recognition is reduced to the classification of expression-compensated 2-D images that can be classified with state-of-the-art algorithms. Experimental results verify theoretical assumptions and demonstrate the benefits of the geodesic polar parameterization on 3-D face recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.