This paper proposes a method to estimate the 6D pose of an object grasped by a robot hand using RGB cameras mounted on the palm and visuotactile sensors installed at the fingertips. It can deal with objects made from a wide range of materials thanks to combining the two types of sensors. The method allows a robot to robot to perform in-hand pose estimation while holding the object, eliminating the need for preparatory actions or particular environmental backgrounds. The mechanism at the back of the method includes deep-learning-based background subtraction and denoising auto-encoder-based sensor fusion. With the poses estimated using the proposed method, a robot controller can rectify the grasping uncertainty and adjust the robot motion to move an object toward required goals with satisfying accuracy. We conduct various studies and analyses in the experimental section to understand the proposed method's advantages and disadvantages. The results demonstrate the benefits of the proposed combination and mechanism. They also provide essential knowledge to readers considering using a similar configuration for estimating object poses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.