2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00326
|View full text |Cite
|
Sign up to set email alerts
|

Dual Attention Network for Scene Segmentation

Abstract: In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. Unlike previous works that capture contexts by multi-scale feature fusion, we propose a Dual Attention Network (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attentio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
3,135
1
4

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 5,417 publications
(3,146 citation statements)
references
References 27 publications
6
3,135
1
4
Order By: Relevance
“…One possible reason is the information loss from the pooling layers in the encoding process (by ResNet), which is not suitable for our HSE mapping task and Sentinel-2 data. Furthermore, this loss cannot be compensated, even with the sophisticated design of the decoding part, either with pyramid scene parsing by ResNet-PSPNet, or upsampling with low-level features considered by ResNet-FCN-8, or attention modules proposed in [61]. These observations confirm the assumptions that motivate our framework design: good performance is not guaranteed when simply and directly using the state-of-the-art networks for remote sensing tasks.…”
Section: Comparison With Baseline Methodsmentioning
confidence: 60%
See 1 more Smart Citation
“…One possible reason is the information loss from the pooling layers in the encoding process (by ResNet), which is not suitable for our HSE mapping task and Sentinel-2 data. Furthermore, this loss cannot be compensated, even with the sophisticated design of the decoding part, either with pyramid scene parsing by ResNet-PSPNet, or upsampling with low-level features considered by ResNet-FCN-8, or attention modules proposed in [61]. These observations confirm the assumptions that motivate our framework design: good performance is not guaranteed when simply and directly using the state-of-the-art networks for remote sensing tasks.…”
Section: Comparison With Baseline Methodsmentioning
confidence: 60%
“…ResNet-FCN-8 [20], and attention-based FCN [61], which have been shown to be more powerful for detailed semantic segmentation. One possible reason is the information loss from the pooling layers in the encoding process (by ResNet), which is not suitable for our HSE mapping task and Sentinel-2 data.…”
Section: Comparison With Baseline Methodsmentioning
confidence: 99%
“…Due to the attention modules, the proposed method, i.e., mask (m), successfully separates the top right regions. When comparing mask (e), (k), and (n), mask (k) [13] 0.7607 0.6014 0.4933 0.5853 U-net [14] 0.7524 0.6004 0.4814 0.6057 Deeplabv2 [17] 0.7348 0.6004 0.4746 0.5915 RefineNet [19] 0.7641 0.5961 0.4817 0.6134 PSPNet [18] 0.7292 0.6338 0.4934 0.5933 Deeplabv3+ [10] 0.7550 0.6009 0.4828 0.6079 DAN [28] 0.7376 0.6043 0.4786 0.5948 Deepunet [15] 0 connects the middle regions to some big regions, however, the proposed method, that is, mask (n), smoothly separates the middle regions to the correct shapes. The same results can be obtained in masks (l) and (o).…”
Section: Resultsmentioning
confidence: 99%
“…SCA-CNN does not only use spatial and channel attention but also multilayer attention. Fu et al [28] proposed a dual attention network (DAN) that uses position attention and channel attention. These two modules were firstly used in the semantic segmentation and achieved good results.…”
Section: Attention Modules Deep Learning Is An Algorithmmentioning
confidence: 99%
“…Self-Attention aims to integrate local features with their global dependencies, and as shown in previous work [42,10], it improve results in image segmentation and generation. Our implementation is based on [10] dual-attention.…”
Section: Model Architecturementioning
confidence: 98%