Fast Generation of High-Fidelity RGB-D Images by Deep Learning With Adaptive Convolution

Xian, Chuhua; Zhang, Dongjiu; Dai, Chengkai; Wang, Charlie C. L.

doi:10.1109/tase.2020.3002069

Cited by 5 publications

(1 citation statement)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, deep-learning methods have demonstrated enormous potential for depth completion. For example, Xian et al ( 2020 ) introduced an adaptive convolution method with three cascaded modules to address low-resolution and missing regions from indoor scenes. Hu et al ( 2021 ) proposed a dual-branch convolutional neural network (CNN) that fuses a color image and sparse depth map to generate dense outdoor depths.…”

Section: Related Workmentioning

confidence: 99%

ClueDepth Grasp: Leveraging positional clues of depth for completing depth of transparent objects

Hong

Chen²,

Yu³

et al. 2022

Front. Neurorobot.

View full text Add to dashboard Cite

Obtaining accurate depth information is key to robot grasping tasks. However, for transparent objects, RGB-D cameras have difficulty perceiving them owing to the objects' refraction and reflection properties. This property makes it difficult for humanoid robots to perceive and grasp everyday transparent objects. To remedy this, existing studies usually remove transparent object areas using a model that learns patterns from the remaining opaque areas so that depth estimations can be completed. Notably, this frequently leads to deviations from the ground truth. In this study, we propose a new depth completion method [i.e., ClueDepth Grasp (CDGrasp)] that works more effectively with transparent objects in RGB-D images. Specifically, we propose a ClueDepth module, which leverages the geometry method to filter-out refractive and reflective points while preserving the correct depths, consequently providing crucial positional clues for object location. To acquire sufficient features to complete the depth map, we design a DenseFormer network that integrates DenseNet to extract local features and swin-transformer blocks to obtain the required global information. Furthermore, to fully utilize the information obtained from multi-modal visual maps, we devise a Multi-Modal U-Net Module to capture multiscale features. Extensive experiments conducted on the ClearGrasp dataset show that our method achieves state-of-the-art performance in terms of accuracy and generalization of depth completion for transparent objects, and the successful employment of a humanoid robot grasping capability verifies the efficacy of our proposed method.

show abstract

Section: Related Workmentioning

confidence: 99%