Road detection technology plays an essential role in a variety of applications, such as urban planning, map updating, traffic monitoring and automatic vehicle navigation. Recently, there has been much development in detecting roads in high-resolution (HR) satellite images based on semantic segmentation. However, the objects being segmented in such images are of small size, and not all the information in the images is equally important when making a decision. This paper proposes a novel approach to road detection based on semantic segmentation and edge detection. Our approach aims to combine these two techniques to improve road detection, and it produces sharp-pixel segmentation maps, using the segmented masks to generate road edges. In addition, some well-known architectures, such as SegNet, used multi-scale features without refinement; thus, using attention blocks in the encoder to predict fine segmentation masks resulted in finer edges. A combination of weighted cross-entropy loss and the focal Tversky loss as the loss function is also used to deal with the highly imbalanced dataset. We conducted various experiments on two datasets describing real-world datasets covering the three largest regions in Saudi Arabia and Massachusetts. The results demonstrated that the proposed method of encoding HR feature maps effectively predicts sharp segmentation masks to facilitate accurate edge detection, even against a harsh and complicated background.
In recent years, the development of smart transportation has accelerated research on semantic segmentation as it is one of the most important problems in this area. A large receptive field has always been the center of focus when designing convolutional neural networks for semantic segmentation. A majority of recent techniques have used maxpooling to increase the receptive field of a network at an expense of decreasing its spatial resolution. Although this idea has shown improved results in object detection applications, however, when it comes to semantic segmentation, a high spatial resolution also needs to be considered. To address this issue, a new deep learning model, the M-Net is proposed in this paper which satisfies both high spatial resolution and a large enough receptive field while keeping the size of the model to a minimum. The proposed network is based on an encoder-decoder architecture. The encoder uses atrous convolution to encode the features at full resolution, and instead of using heavy transposed convolution, the decoder consists of a multipath feature extraction module that can extract multiscale context information from the encoded features. The experimental results reported in the paper demonstrate the viability of the proposed scheme.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.