Achieving high detection accuracy of pavement cracks with complex textures under different lighting conditions is still challenging. In this context, an encoder-decoder network-based architecture named CrackResAttentionNet was proposed in this study, and the position attention module and channel attention module were connected after each encoder to summarize remote contextual information. The experiment results demonstrated that, compared with other popular models (ENet, ExFuse, FCN, LinkNet, SegNet, and UNet), for the public dataset, CrackResAttentionNet with BCE loss function and PRelu activation function achieved the best performance in terms of precision (89.40), mean IoU (71.51), recall (81.09), and F1 (85.04). Meanwhile, for a self-developed dataset (Yantai dataset), CrackResAttentionNet with BCE loss function and PRelu activation function also had better performance in terms of precision (96.17), mean IoU (83.69), recall (93.44), and F1 (94.79). In particular, for the public dataset, the precision of BCE loss and PRelu activation function was improved by 3.21. For the Yantai dataset, the results indicated that the precision was improved by 0.99, the mean IoU was increased by 0.74, the recall was increased by 1.1, and the F1 for BCE loss and PRelu activation function was increased by 1.24.
Traffic sign detection is extremely important in autonomous driving and transportation safety systems. However, the accurate detection of traffic signs remains challenging, especially under extreme conditions. This paper proposes a novel model called Traffic Sign Yolo (TS-Yolo) based on the convolutional neural network to improve the detection and recognition accuracy of traffic signs, especially under low visibility and extremely restricted vision conditions. A copy-and-paste data augmentation method was used to build a large number of new samples based on existing traffic-sign datasets. Based on You Only Look Once (YoloV5), the mixed depth-wise convolution (MixConv) was employed to mix different kernel sizes in a single convolution operation, so that different patterns with various resolutions can be captured. Furthermore, the attentional feature fusion (AFF) module was integrated to fuse the features based on attention from same-layer to cross-layer scenarios, including short and long skip connections, and even performing the initial fusion with itself. The experimental results demonstrated that, using the YoloV5 dataset with augmentation, the precision was 71.92, which was increased by 34.56 compared with the data without augmentation, and the mean average precision mAP_0.5 was 80.05, which was increased by 33.11 compared with the data without augmentation. When MixConv and AFF were applied to the TS-Yolo model, the precision was 74.53 and 2.61 higher than that with data augmentation only, and the value of mAP_0.5 was 83.73 and 3.68 higher than that based on the YoloV5 dataset with augmentation only. Overall, the performance of the proposed method was competitive with the latest traffic sign detection approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.