.
Semantic scene segmentation has become an important application in computer vision and is an essential part of intelligent transportation systems for complete scene understanding of the surrounding environment. Several methods based on convolutional neural networks have emerged, but they have some problems, including small-scale target loss, inaccurate detailed region segmentation, and boundary category confusion. Using shallow features, we exploit the capabilities of global context information according to the theory of pyramids. A weighted pyramid feature fusion module is constructed to fuse the feature maps of different scales generated by the backbone network, and the proportion of feature fusion is dynamically updated by trainable parameters. After that, a self-attention mechanism is introduced to discover information about spatial channel interdependencies. Finally, the atrous spatial pyramid pooling module of the DeepLabv3+ network is improved by connecting the atrous convolution with different dilation rates at the receptive field. The experimental results show 4.1% mean pixel accuracy and 3.92% mean intersection over union improvements in the proposed method compared with the DeepLabv3+, and the result of semantic segmentation is more accurate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.