“…Some of these methods use visual features in 2D images to localize graspable regions, [2,3] while others use range or depth data for this purpose [4,5,6,7,8], the latter becoming more popular owing to the availability of low-cost RGBD sensors. Recently, the deep learning-based methods are becoming increasingly popular for detecting graspable regions [9,10,11,12,13]. Most of the existing methods for vision-based grasping can be broadly classified into two categories: one that relies on the availability of accurate geometric information about the object (or a CAD model) [14,15,16] making them impractical in several real-world use cases, and the other that computes grasping affordances directly from a RGBD point cloud by harnessing local geometric features without knowing the object identity or its accurate 3D geometry [6,17,18,19].…”