Contextual information is a key factor affecting semantic segmentation. Recently, many methods have tried to use the self-attention mechanism to capture more contextual information. However, these methods with self-attention mechanism need a huge computation. In order to solve this problem, a novel self-attention network, called FFANet, is designed to efficiently capture contextual information, which reduces the amount of calculation through strip pooling and linear layers. It proposes the feature fusion (FF) module to calculate the affinity matrix. The affinity matrix can capture the relationship between pixels. Then we multiply the affinity matrix with the feature map, which can selectively increase the weight of the region of interest. Extensive experiments on the public datasets (PASCAL VOC2012, CityScapes) and remote sensing dataset (DLRSD) have been conducted and achieved Mean Iou score 74.5%, 70.3%, and 63.9% respectively. Compared with the current typical algorithms, the proposed method has achieved excellent performance.
The semantic segmentation of remote sensing images is a critical and challenging task. How to easily and reliably segment useful information from vast remote sensing images is a significant issue. Many methods based on convolutional neural networks have been widely explored to obtain more accurate segmentation from remote sensing images. However, due to the uniqueness of remote sensing images, such as the dramatic changes in the scale of the target object, the results are not satisfactory. To solve the problem, a special network is designed: (1) Create a new backbone network. Compared with ResNet50, the proposed method extracts features of varying sizes more effectively. (2) Reduce spatial information loss. Building a hybrid location module to compensate for the position loss caused by the down-sampling operation. (3) Models with high discriminant ability. In order to improve the discrimination ability of the model, a novel auxiliary loss function is designed to constrain the distance between inter-class and intra-class. The proposed algorithm is tested on remote sensing datasets (e.g., NWPU-45, DLRSD, and WHDLD). The experimental results show that this method obtains the best results and achieves state-of-the-art performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.