Object detection and pose estimation are strict requirements for many robotic grasping and manipulation applications to endow robots with the ability to grasp objects with different properties in cluttered scenes and with various lighting conditions. This work proposes the framework i2c-net to extract the 6D pose of multiple objects belonging to different categories, starting from an instance-level pose estimation network and relying only on RGB images. The network is trained on a custom-made synthetic photo-realistic dataset, generated from some base CAD models, opportunely deformed, and enriched with real textures for domain randomization purposes. At inference time, the instance-level network is employed in combination with a 3D mesh reconstruction module, achieving category-level capabilities. Depth information is used for post-processing as a correction. Tests conducted on real objects of the YCB-V and NOCS-REAL datasets outline the high accuracy of the proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.