In automatic driving, the recognition of Front-Vehicle taillights plays a key role in predicting the intentions of the vehicle ahead. In order to accurately identify the Front-Vehicle taillights, we first analyze the different characteristics of the vehicle taillight signal, and then propose an improved taillight recognition model based on YOLOv5s. First, CA(coordinate attention) is inserted into the backbone network of YOLOv5s model to improve small target recognition and reduce interference from other light sources; Then, the EIOU Loss is used to solve the class imbalance problem; Finally, EIOU-NMS is used to solve the problem of anchor box error suppression in the recognition process. We use the actual scene video and vehicle taillights dataset to conduct ablation experiments to verify the effectiveness of the improved algorithm. The experimental results show that the mAP value of the model is 9.2% higher than YOLOv5s.INDEX TERMS Autonomous driving, vehicle taillights recognition, ablation experiment.
In skeleton-based human action recognition, Transformer, which models the correlations between joint pairs in global topology, has achieved remarkable results. However, compared to many researches on changing graph topology learning in GCN, Transformer self-attention ignores the topology of the skeleton graph when capturing the dependencies between joints. To address these problems, we propose a novel two-stream spatial Graphormer network (2s-SGR), which models joint and bone information using self-attention incorporating structural encodings. First, in the joint stream, while Transformer models joint correlations in the global topology of the space, the topology of the joints and the edge information of the bones are introduced into the self-attention through custom structural encodings. At the same time, joint motion information is modeled in spatial-temporal blocks. The added information on structure and motion can effectively capture the dependencies of nodes between frames and enhance feature representation. Second, for the second-order information of the skeleton, the bone stream adapts to the structure of the bone by adjusting the custom structural encodings. Finally, the global spatial-temporal features of joints and bones in the skeleton are fused and input into the classification network to obtain action recognition results. Extensive experiments on three large-scale datasets, NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics, demonstrate that the performance of the 2s-SGR proposed in this paper is at the state-of-the-art level and is effectively validated by ablation experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.