“…Empirical approaches, on the other hand, learn to predict the quality of grasp candidates from data on a diverse set of objects, images, and grasp attempts collected through human labeling [19], [20], [21], [22], self-supervision [23], [24], or simulated data [25], [26], [3], [27], [1]. Saxena et al [19] trained a classifier on human-labeled RGB images to predict grasp points, triangulated the points on stereo RGB images, and demonstrated successful grasps on a limited set of household objects, including some transparent and specular objects.…”