2023
DOI: 10.1109/jstars.2022.3224081
|View full text |Cite
|
Sign up to set email alerts
|

Axial Cross Attention Meets CNN: Bibranch Fusion Network for Change Detection

Abstract: In the previous years, Vision Transformer has demonstrated a global information extraction capability in the field of Computer Vision that CNN lacks. Due to the lack of inductive bias in Vision Transformer, it requires a large amount of data to support its training. In the field of remote sensing, it costs a lot to obtain a significant number of high-resolution remote sensing images. Most existing change detection networks based on deep learning rely heavily on CNN, which cannot effectively utilize the long-di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 68 publications
(41 citation statements)
references
References 40 publications
0
17
0
Order By: Relevance
“…The SPAM structure in SPANet is used to extract deeper multiscale features and salient features, and the FFM structure is used to complete the fusion of low-order features and high-order features. In the field of deep learning for remote sensing change monitoring, Song et al [27] proposed ACABFNet. ACABFNet extracts the local information of the image by CNN branch and the global information of the image by transformer branch, and then fuses the two by using bidirectional fusion.…”
Section: B Attention Modulesmentioning
confidence: 99%
“…The SPAM structure in SPANet is used to extract deeper multiscale features and salient features, and the FFM structure is used to complete the fusion of low-order features and high-order features. In the field of deep learning for remote sensing change monitoring, Song et al [27] proposed ACABFNet. ACABFNet extracts the local information of the image by CNN branch and the global information of the image by transformer branch, and then fuses the two by using bidirectional fusion.…”
Section: B Attention Modulesmentioning
confidence: 99%
“…Hybrid-TransCD [8] introduces the hybrid transformer structure to capture multi-granularity global context dependencies, which aggregates fine-grained change details with coarse-grained change region information and can effectively retain spectral features in complex scenes, with an F1 score of 90.06 on the LEVIR-CD dataset [9]. ACABFNet [10] extracts the local and global information of the images by CNN branch and transformer branch, respectively, and then fuses the local and global features by bidirectional fusion, with an F1 score of 90.68 on the LEVIR-CD dataset [9]. SARAS-Net [11] designed three modules: relation-awareness, scale-awareness, and cross-transformer, which effectively solves the problem of CD between different scenes, and its F1 score on the LEVIR-CD dataset reaches 91.91.…”
Section: Introductionmentioning
confidence: 99%
“…As a fundamental problem in the field of computer vision, semantic segmentation has obtained tremendous improvements during the past few years. As shown in Figure 1 , it has been widely used in medical image recognition [ 1 ], 3D points Clouds [ 2 ], geological exploration [ 3 ], cloud and cloud shadow segmentation [ 4 , 5 ], remote sensing image [ 6 , 7 , 8 , 9 ], and automatic driving [ 10 ], etc. Existing semantic segmentation models based on convolutional neural networks (CNNs) (e.g., U-Net [ 11 ] and DeepLab [ 12 ]) often rely on a large amount of pixel-level labeled data, which leads to the following two problems: (1) it costs plenty of labor due to the fact that all training samples should be marked manually one by one, and (2) these models perform poorly in recognizing novel objects.…”
Section: Introductionmentioning
confidence: 99%