“…For the experiments, we report the five foreground classes' F1-score and their average F1-score and mIoU, as in the previous works [9,10,28]. The results on the Potsdam datasets are reported in Table 1.…”
Section: Resultsmentioning
confidence: 99%
“…To better exploit the multilevel information, global contextual information was used in [27] throughout multiple levels to gain stable results. Attention modules [10,28] were added in the last stages to better aggregate information for the task. The relation module [9] was proposed to model the relationship in the spatial dimension and the feature dimension.…”
“…We chose 16 images to train the model and the remaining 17 to test the model's performance. The test set contained the 2,4,6,8,10,12,14,16,20,22,24,27,29,31,33,35, and 38 areas, following previous works [9,28,61].…”
Section: Convolutional Embedding and Light Decoding Modulementioning
confidence: 99%
“…DCNNs [7,8] take the remote sensing image as the input and directly map the input image into the desired output (class, object boxes, and masks). In the remote sensing image semantic segmentation field, many works [9,10] using convolutional neural networks have been proposed to tackle the problem. The segmentation results are better than traditional methods thanks to the deep layers and the end-to-end training paradigm.…”
The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.
“…For the experiments, we report the five foreground classes' F1-score and their average F1-score and mIoU, as in the previous works [9,10,28]. The results on the Potsdam datasets are reported in Table 1.…”
Section: Resultsmentioning
confidence: 99%
“…To better exploit the multilevel information, global contextual information was used in [27] throughout multiple levels to gain stable results. Attention modules [10,28] were added in the last stages to better aggregate information for the task. The relation module [9] was proposed to model the relationship in the spatial dimension and the feature dimension.…”
“…We chose 16 images to train the model and the remaining 17 to test the model's performance. The test set contained the 2,4,6,8,10,12,14,16,20,22,24,27,29,31,33,35, and 38 areas, following previous works [9,28,61].…”
Section: Convolutional Embedding and Light Decoding Modulementioning
confidence: 99%
“…DCNNs [7,8] take the remote sensing image as the input and directly map the input image into the desired output (class, object boxes, and masks). In the remote sensing image semantic segmentation field, many works [9,10] using convolutional neural networks have been proposed to tackle the problem. The segmentation results are better than traditional methods thanks to the deep layers and the end-to-end training paradigm.…”
The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.
“…Paper [14] considers obtaining accurate multiscale semantic information from images for high-quality semantic segmentation. A model called cross fusion net (CF-Net) is proposed for fast and efficient extraction of multiscale semantic information.…”
Section: Improving a Neural Network Model For Semantic Segmentation Of Images Of Monitored Objects In Aerial Photographsmentioning
This paper considers a model of the neural network for semantically segmenting the images of monitored objects on aerial photographs. Unmanned aerial vehicles monitor objects by analyzing (processing) aerial photographs and video streams. The results of aerial photography are processed by the operator in a manual mode; however, there are objective difficulties associated with the operator's handling a large number of aerial photographs, which is why it is advisable to automate this process. Analysis of the models showed that to perform the task of semantic segmentation of images of monitored objects on aerial photographs, the U-Net model (Germany), which is a convolutional neural network, is most suitable as a basic model. This model has been improved by using a wavelet layer and the optimal values of the model training parameters: speed (step) ‒ 0.001, the number of epochs ‒ 60, the optimization algorithm ‒ Adam. The training was conducted by a set of segmented images acquired from aerial photographs (with a resolution of 6,000×4,000 pixels) by the Image Labeler software in the mathematical programming environment MATLAB R2020b (USA). As a result, a new model for semantically segmenting the images of monitored objects on aerial photographs with the proposed name U-NetWavelet was built.
The effectiveness of the improved model was investigated using an example of processing 80 aerial photographs. The accuracy, sensitivity, and segmentation error were selected as the main indicators of the model's efficiency. The use of a modified wavelet layer has made it possible to adapt the size of an aerial photograph to the parameters of the input layer of the neural network, to improve the efficiency of image segmentation in aerial photographs; the application of a convolutional neural network has allowed this process to be automatic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.