Accurately detecting suitable grasp areas for unknown objects through visual information remains a challenging task. Drawing inspiration from the success of the Vision Transformer in vision detection, the hybrid Transformer-CNN architecture for robotic grasp detection, known as HTC-Grasp, is developed to improve the accuracy of grasping unknown objects. The architecture employs an external attention-based hierarchical Transformer as an encoder to effectively capture global context and correlation features across the entire dataset. Furthermore, a channel-wise attention-based CNN decoder is presented to adaptively adjust the weight of the channels in the approach, resulting in more efficient feature aggregation. The proposed method is validated on the Cornell and the Jacquard dataset, achieving an image-wise detection accuracy of 98.3% and 95.8% on each dataset, respectively. Additionally, the object-wise detection accuracy of 96.9% and 92.4% on the same datasets are achieved based on this method. A physical experiment is also performed using the Elite 6Dof robot, with a grasping accuracy rate of 93.3%, demonstrating the proposed method’s ability to grasp unknown objects in real scenarios. The results of this study indicate that the proposed method outperforms other state-of-the-art methods.
The location and capacity of express distribution centers and delivery point allocation are mixed-integer programming problems modeled as capacitated location and allocation problems (CLAPs), which may be constrained by the position and capacity of distribution centers and the assignment of delivery points. The solution representation significantly impacts the search efficiency when applying swarm-based algorithms to CLAPs. In a traditional encoding scheme, the solution is the direct representation of position, capacity, and assignment of the plan and the constraints are handled by punishment terms. However, the solutions that cannot satisfy the constraints are evaluated during the search process, which reduces the search efficiency. A general encoding scheme that uses a vector of uniform range elements is proposed to eliminate the effect of constraints. In this encoding scheme, the number of distribution centers is dynamically determined during the search process, and the capacity of distribution centers and the allocation of delivery points are determined by the random proportion and random key of the elements in the encoded solution vector. The proposed encoding scheme is verified on particle swarm optimization, differential evolution, artificial bee colony, and powerful differential evolution variant, and the performances are compared to those of the traditional encoding scheme. Numerical examples with up to 50 delivery points show that the proposed encoding scheme boosts the performance of all algorithms without altering any operator of the algorithm.
In modern integrated circuit manufacturing processes, wafers are always transported from one procedure to another. To reduce the risk of dust, Front Opening Unified Pod (FOUP) load-port system is always adopted. Misplaced wafers should be detected before transported. Traditional methods always fail to detect wafer states correctly. To improve detection accuracy, this paper proposed a vision based method. Wafer overlap and malposition detection approach based on modified YOLO-V3 algorithm was suggested. Experiment results shows superiority of the proposed approach.
In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.