“…These methods are easy to understand but often suffer efficiency problems due to 3D operations. Motivated by the superior performance of deep learning technology on detection or segmentation tasks (Cheon et al, 2022 ; Huang et al, 2022 ; Khan et al, 2022 ), image-based deep models have become popular for grasp detection (Chu et al, 2018 ; Zhang et al, 2019 ; Dong et al, 2021 ; Yu et al, 2022a ). These methods often use a rectangle representation g = ( x, y, h, w , θ), where ( x, y ) is the center pixel location of a grasp candidate, ( h, w ) are height and width of the gripper, and θ is the rotation of the gripper.…”