2023
DOI: 10.1109/tnnls.2022.3144791
|View full text |Cite
|
Sign up to set email alerts
|

A Vision Transformer Model for Convolution-Free Multilabel Classification of Satellite Imagery in Deforestation Monitoring

Abstract: Understanding the dynamics of deforestation and land uses of neighboring areas is of vital importance for the design and development of appropriate forest conservation and management policies. In this article, we approach deforestation as a multilabel classification (MLC) problem in an endeavor to capture the various relevant land uses from satellite images. To this end, we propose a multilabel vision transformer model, ForestViT, which leverages the benefits of the self-attention mechanism, obviating any conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 55 publications
(26 citation statements)
references
References 24 publications
0
26
0
Order By: Relevance
“…To improve their annotation, Deep Neural Networks (DNNs) with emphasis on Convolutional Neural Networks (CNNs) for image classification are considered [4]. They are compelling for object detection in remote sensing data, covering several applications including building extraction [5], deforestation [6], land cover change [7] and others.…”
Section: Introductionmentioning
confidence: 99%
“…To improve their annotation, Deep Neural Networks (DNNs) with emphasis on Convolutional Neural Networks (CNNs) for image classification are considered [4]. They are compelling for object detection in remote sensing data, covering several applications including building extraction [5], deforestation [6], land cover change [7] and others.…”
Section: Introductionmentioning
confidence: 99%
“…Transformers have been used for multiple computer vision tasks, namely scene classification [19], [20], change detection [21], and image segmentation [22], [23]. ViTs were also used for various tasks in satellite imagery, such as change detection [24] and deforestation monitoring [25]. The ViTs results were convincing, they even outperformed the classical convolutional architectures.…”
Section: A Vision Transformersmentioning
confidence: 99%
“…5(b). In this module, we design an attention mechanism based on the angle channel, while other attention mechanisms have gained special interest in the field of remote sensing in recent years [60,61]. The input of this module is the feature map extracted by the backbone network, and the output is a new feature map with the same shape as the input.…”
Section: Angle Channel Attention Modulementioning
confidence: 99%