Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

Zhang, Qiang; XueYing, Sun

doi:10.3390/s23063340

Cited by 2 publications

(1 citation statement)

References 42 publications

(88 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhang et al [59] proposed a hybrid Transformer-CNN method for 2-DoF object pose detection. They further proposed a bilateral neural network architecture [60] for RGB and depth image fusion and achieved promising results. In 6-DoF pose detection area, Wang et al [6] introduced the DenseFusion framework for precise 6-DoF pose estimation using two data sources and a dense fusion network.…”

Section: Multi-modal Data Based Object Pose Estimationmentioning

confidence: 99%

Enhancing 6-DoF Object Pose Estimation through Multiple Modality Fusion: A Hybrid CNN Architecture with Cross-Layer and Cross-Modal Integration

Wang,

Sun,

Wei

et al. 2023

Machines

View full text Add to dashboard Cite

Recently, applying the utilization of RGB-D data for robot perception tasks has garnered significant attention in domains like robotics and autonomous driving. However, a prominent challenge in this field lies in the substantial impact of feature robustness on both segmentation and pose estimation tasks. To tackle this challenge, we proposed a pioneering two-stage hybrid Convolutional Neural Network (CNN) architecture, which connects segmentation and pose estimation in tandem. Specifically, we developed Cross-Modal (CM) and Cross-Layer (CL) modules to exploit the complementary information from RGB and depth modalities, as well as the hierarchical features from diverse layers of the network. The CM and CL integration strategy significantly enhanced the segmentation accuracy by effectively capturing spatial and contextual information. Furthermore, we introduced the Convolutional Block Attention Module (CBAM), which dynamically recalibrated the feature maps, enabling the network to focus on informative regions and channels, thereby enhancing the overall performance of the pose estimation task. We conducted extensive experiments on benchmark datasets to evaluate the proposed method and achieved exceptional target pose estimation results, with an average accuracy of 94.5% using the ADD-S AUC metric and 97.6% of ADD-S smaller than 2 cm. These results demonstrate the superior performance of our proposed method.

show abstract

Section: Multi-modal Data Based Object Pose Estimationmentioning

confidence: 99%

Enhancing 6-DoF Object Pose Estimation through Multiple Modality Fusion: A Hybrid CNN Architecture with Cross-Layer and Cross-Modal Integration

Wang,

Sun,

Wei

et al. 2023

Machines

View full text Add to dashboard Cite

show abstract

A two-stage grasp detection method for sequential robotic grasping in stacking scenarios

Zhang,

Yin,

Zhong

et al. 2024

MBE

View full text Add to dashboard Cite

<abstract> <p>Dexterous grasping is essential for the fine manipulation tasks of intelligent robots; however, its application in stacking scenarios remains a challenge. In this study, we aimed to propose a two-phase approach for grasp detection of sequential robotic grasping, specifically for application in stacking scenarios. In the initial phase, a rotated-YOLOv3 (R-YOLOv3) model was designed to efficiently detect the category and position of the top-layer object, facilitating the detection of stacked objects. Subsequently, a stacked scenario dataset with only the top-level objects annotated was built for training and testing the R-YOLOv3 network. In the next phase, a G-ResNet50 model was developed to enhance grasping accuracy by finding the most suitable pose for grasping the uppermost object in various stacking scenarios. Ultimately, a robot was directed to successfully execute the task of sequentially grasping the stacked objects. The proposed methodology demonstrated the average grasping prediction success rate of 96.60% as observed in the Cornell grasping dataset. The results of the 280 real-world grasping experiments, conducted in stacked scenarios, revealed that the robot achieved a maximum grasping success rate of 95.00%, with an average handling grasping success rate of 83.93%. The experimental findings demonstrated the efficacy and competitiveness of the proposed approach in successfully executing grasping tasks within complex multi-object stacked environments.</p> </abstract>

show abstract

Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

Cited by 2 publications

References 42 publications

Enhancing 6-DoF Object Pose Estimation through Multiple Modality Fusion: A Hybrid CNN Architecture with Cross-Layer and Cross-Modal Integration

Enhancing 6-DoF Object Pose Estimation through Multiple Modality Fusion: A Hybrid CNN Architecture with Cross-Layer and Cross-Modal Integration

A two-stage grasp detection method for sequential robotic grasping in stacking scenarios

Contact Info

Product

Resources

About