In the work of using robots to grasp end-of-life cars, the position of the vehicle door frame needs to be grasped. Fast and accurate positioning of the vehicle door frame is the key to realize the automatic car grasping process. Traditional methods for locating and grasping scrap cars rely heavily on manual operation and suffer from low grasping efficiency and poor accuracy. Therefore, this paper proposes a binocular vision robot vehicle door frame spatial localization method based on the improved YOLOv4. A lightweight and efficient feature fusion target detection network in complex environments is proposed in the method, and the target detection results are combined with an improved SURF feature matching method to locate the vehicle door frame position. To simplify the network structure, MobileNetv3 is used instead of the backbone network CSPDarknet53, and deep separable convolution is used in the network. Its to improve the sensitivity of the network to vehicle door frame targets in complex environments, an improved convolutional block attention module is added to PANet, and adaptive spatial feature fusion (ASFF) is introduced into the network to make full use of features at different scales for more effective feature fusion. Compared with YOLOv4, the number of network parameters is reduced by 73.8%, the mAP is improved by 1.35%, and the detection speed is increased by 28.7%. The experimental results show that the positioning accuracy of the system is 0.745 mm, which meets the positioning measurement error of less than 1 cm required for the vehicle door frame. The paper also compares with other network models and the results show that the method achieves a good balance between detection speed and detection accuracy, satisfying the task of identifying vehicle door frames in complex environments with good detection results.
Recent approaches based on convolutional neural networks significantly improve the performance of structured light image depth estimation in structured light 3D measurement. However, it remains challenging to simultaneously preserve the global structure and local details of objects for the structured light images in complex scenes. In this paper, we design a parallel CNN-Transformer network, which consists of a CNN branch, a Transformer branch, a bidirectional feature fusion module (BFFM), and a cross-feature multi-scale fusion module (CFMS). The BFFM and CFMS modules are proposed to fuse local and global features of the double branches in order to achieve better depth estimation. Comprehensive experiments are conducted to evaluate our model on four structured light datasets, i.e., our established simulated fringe and speckle structured light datasets, and public real fringe and speckle structured light datasets. Experiments demonstrate that the proposed PCTNet is an effective architecture, achieving state-of-the-art performance in both qualitative and quantitative evaluation in in structured light 3D measurement.
Deep learning based on convolutional neural network (CNN) has attracted more and more attention in phase unwrapping of fringe projection three-dimensional (3D) measurement. However, due to the inherent limitations of convolutional operator, it is difficult to accurately determine the fringe order in wrapped phase patterns that rely on continuity and globality.To attack this problem, in this paper we develop a hybrid CNN-transformer model (Hformer) dedicated to phase unwrapping via fringe order prediction. The proposed Hformer model has a hybrid CNN-transformer architecture that is mainly composed of backbone, encoder, and decoder to take advantage of both CNN and transformer. Backbone is used as a wrapped phase pattern feature extractor. Encoder and decoder with cross attention are designed to enhance global dependency for the fringe order prediction. Experimental results show that the proposed Hformer model achieves better performance in fringe order prediction compared with the CNN models such as U-Net and DCNN. Our work opens an alternative way to the CNN-dominated deep learning phase unwrapping of fringe projection 3D measurement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.