“…One of the early works in this direction focused in the detection of graspable object areas by creating local visual descriptors of grasping points and estimating the probability of the presence of a graspable object based on the Bernoulli trial (Montesano & Lopes, 2009). New approaches employ Convolutional Neural Network (CNN) models to produce classes of functional object parts from RGB (Nguyen et al, 2017;Do et al (Montesano & Lopes, 2009) 2D & Keypoints (Zhao & Zhu, 2013) 2D & 3D (Myers et al, 2015) 2.5D (Nguyen et al, 2016) 2.5D (Nguyen et al, 2017) 2D (Kokic et al, 2017) Synthetic (Sawatzky et al, 2017) 2D & Keypoints (Do et al, 2018) 2D (Wang & Tarr, 2020) 2D (Deng et al, 2021) 3D (Xu et al, 2021) 2.5D & Keypoints (Turek et al, 2010) 2D (Qi et al, 2018) 3D (Kjellström et al, 2011) 2D (Yao et al, 2013) 3D (Qi et al, 2017) 2.5D (Gkioxari et al, 2018) 2D (Fang et al, 2018) 2D (Chuang et al, 2018) 2D (Tan et al, 2019) 2D (Wu et al, 2020) Synthetic (Hou et al, 2021) 2D (Sridhar et al, 2008) 2D (Aksoy et al, 2010) 2D (Aksoy et al, 2011) 2D (Pieropan et al, 2013) 2.5D (Pieropan et al, 2014) 2D (Moldovan & De Raedt, 2014) Synthetic (Liang et al, 2016) 2.5D (Liang et al, 2018) 2.5D 2018; Sawatzky et al, 2017) and synthetic data (Kokic et al, 2017). However, depth cues along with the RGB information have demonstrated a greater detection accuracy in this task (Nguyen et al, 2016;…”