2022
DOI: 10.14358/pers.21-00062r2
|View full text |Cite
|
Sign up to set email alerts
|

Multi-View Urban Scene Classification with a Complementary-Information Learning Model

Abstract: Traditional urban scene-classification approaches focus on images taken either by satellite or in aerial view. Although single-view images are able to achieve satisfactory results for scene classification in most situations, the complementary information provided by other image views is needed to further improve performance. Therefore, we present a complementary information-learning model (CILM) to perform multi-view scene classification of aerial and ground-level images. Specifically, the proposed CILM takes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 21 publications
0
10
0
Order By: Relevance
“…As mentioned in Section 1, existing multi-view fusion methods can be roughly classified as data-level, feature-level, and decision-level. In this experiment, one data-level fusion method (six-channel [42]), two feature-level fusion methods (feature concatenation [31] and CILM [43]), and five decision-level fusion methods (maximum [31], minimum [31], sum [31], product [31] and SFWS [44]) were chosen to compare with the proposed evidential fusion. These methods are briefly described below.…”
Section: Viewsmentioning
confidence: 99%
See 2 more Smart Citations
“…As mentioned in Section 1, existing multi-view fusion methods can be roughly classified as data-level, feature-level, and decision-level. In this experiment, one data-level fusion method (six-channel [42]), two feature-level fusion methods (feature concatenation [31] and CILM [43]), and five decision-level fusion methods (maximum [31], minimum [31], sum [31], product [31] and SFWS [44]) were chosen to compare with the proposed evidential fusion. These methods are briefly described below.…”
Section: Viewsmentioning
confidence: 99%
“…Feature concatenation [31]: A Siamese-like CNN is used to concatenate the intermediate feature tensors before the first convolution layer that doubles its amount of kernels. • CILM [43]: The loss function of contrast learning is combined with CE Loss in this method, allowing the features extracted by the two subnetworks to be fused without sharing any weight. • Maximum [31]: Each view employs an independent DNN to obtain its prediction result, which consists of a class label and its probability.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…In other studies [24,60], inpainting and relative position pretext tasks are used for segmentation and classification. In recent years, the contrastive-learning-based SSL approach has also been widely used in remote sensing [27,[61][62][63]. Tile2Vec was the first self-supervised approach using contrastive learning for remote sensing image representation learning [64].…”
Section: Contrastive Learning Basedmentioning
confidence: 99%
“…Remote sensing. While contrastive self-supervised learning is relatively new, the wide use of contrastive loss in remote sensing [178,179,180,181,182,183] can date back to [184], where the authors imposed a supervised contrastive regularization term on the CNN features for remote sensing scene classification. The first self-supervised work making use of contrastive learning for remote sensing image representation learning is Tile2Vec proposed by Jean et al [177].…”
Section: Infoncementioning
confidence: 99%