2022
DOI: 10.1109/tgrs.2022.3186634
|View full text |Cite
|
Sign up to set email alerts
|

Building Extraction With Vision Transformer

Abstract: As an important carrier of human productive activities, the extraction of buildings is not only essential for urban dynamic monitoring but also necessary for suburban construction inspection. Nowadays, accurate building extraction from remote sensing images remains a challenge due to the complex background and diverse appearances of buildings. The convolutional neural network (CNN) based building extraction methods, although increased the accuracy significantly, are criticized for their inability for modelling… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
59
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 101 publications
(59 citation statements)
references
References 103 publications
0
59
0
Order By: Relevance
“…For example, convolutional neural networks (CNN) perform regional divisions, use expensive fully connected layers, and they lose accuracy and details due to stacked convolution layers [ 12 ]. Fully convolutional networks (FCN) use fixed-size convolutions that result in a local receptive field; thus, they lack the ability to model in the global context [ 13 ]. Transformers, which have been used frequently in recent years, are computationally inefficient and need large amounts of memory and big datasets [ 6 ].…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…For example, convolutional neural networks (CNN) perform regional divisions, use expensive fully connected layers, and they lose accuracy and details due to stacked convolution layers [ 12 ]. Fully convolutional networks (FCN) use fixed-size convolutions that result in a local receptive field; thus, they lack the ability to model in the global context [ 13 ]. Transformers, which have been used frequently in recent years, are computationally inefficient and need large amounts of memory and big datasets [ 6 ].…”
Section: Introductionmentioning
confidence: 99%
“…In conventional vision methods, manually extracted features including geometrical, spatial, and spectral information, and low-level features such as shape, color, edge, texture, and shadow are used [ 15 ]. These methods generally utilize these manually extracted features and apply classifiers or traditional machine learning techniques (e.g., Random Forests, Boosting, and Support Vector Machines) to achieve building segmentation [ 13 ]. However, extracting these features requires prior knowledge and is labor-intensive [ 16 ].…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations