2023
DOI: 10.3390/rs15041049
|View full text |Cite
|
Sign up to set email alerts
|

RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution

Abstract: The accurate detection and extraction of roads using remote sensing technology are crucial to the development of the transportation industry and intelligent perception tasks. Recently, in view of the advantages of CNNs in feature extraction, its related road extraction methods have been proposed successively. However, due to the limitation of kernel size, they perform less effectively at capturing long-range information and global context, which are crucial for road targets distributed over long distances and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(8 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…The network is designed on the basis of the shifted window operation, attention mechanism, and layering. It mainly consists of multi-layer perceptron (MLP), window multi-head self-attention mechanism (W-MSA), shifted window multi-head self-attention mechanism (SW-MSA), and layer normalization (LN), and it has the advantages of strong feature extraction ability, high prediction accuracy, fast reasoning, and a lower computational requirement compared to the original Transformer [ 39 , 40 ]. The structure of the Swin-Transformer network is shown in Figure 4 .…”
Section: Methodsmentioning
confidence: 99%
“…The network is designed on the basis of the shifted window operation, attention mechanism, and layering. It mainly consists of multi-layer perceptron (MLP), window multi-head self-attention mechanism (W-MSA), shifted window multi-head self-attention mechanism (SW-MSA), and layer normalization (LN), and it has the advantages of strong feature extraction ability, high prediction accuracy, fast reasoning, and a lower computational requirement compared to the original Transformer [ 39 , 40 ]. The structure of the Swin-Transformer network is shown in Figure 4 .…”
Section: Methodsmentioning
confidence: 99%
“…In the field of road extraction, transformer-based models leverage self-attention mechanisms to capture contextual information and generate accurate road predictions. Examples of transformer-based architectures applied to road extraction include Vision Transformer (ViT) [20], Swin Transformer [21], Swin-UNet [22], BDTNet [23], and RoadFormer [24]. These models have shown promising results in capturing global context and improving the accuracy of road extraction.…”
Section: Transformer-based Road Extraction Methodsmentioning
confidence: 99%
“…Although deep learning technology is currently popular in road extraction, mainstream deep learning models typically have a large number of parameters, which leads to increased storage and computing resource consumption on devices. In particular, models with better performance often require more resources, which cramps their widespread applications [ 40 , 41 ]. This limitation is particularly evident in mobile devices that have limited memory, lower computational capabilities, and slower processing speed [ 42 , 43 , 44 , 45 , 46 , 47 ].…”
Section: Introductionmentioning
confidence: 99%
“…However, they only considered the parameters of the model, excluding the amount of calculation. Liu et al [ 41 ] constructed a lightweight decoder using the transposed convolution and skip connections, reducing both the number of parameters and the amount of computation required by the model. Nevertheless, these studies only focus on reducing model parameters and computations without investigating portability to mobile devices.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation