2020
DOI: 10.1007/978-3-030-58586-0_5
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Abstract: Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

4
3

Authors

Journals

citations
Cited by 31 publications
(19 citation statements)
references
References 35 publications
0
19
0
Order By: Relevance
“…Recently, the emergence of deep CNNs brings immense advancements to the community, and many achievements [19]- [34] have been obtained in the field of aerial single-scene classification. These deep networks have hierarchical architectures, where convolutional and max-pooling layers are periodically interleaved for learning high-level features of intricate scenes.…”
Section: A Aerial Single-scene Classificationmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, the emergence of deep CNNs brings immense advancements to the community, and many achievements [19]- [34] have been obtained in the field of aerial single-scene classification. These deep networks have hierarchical architectures, where convolutional and max-pooling layers are periodically interleaved for learning high-level features of intricate scenes.…”
Section: A Aerial Single-scene Classificationmentioning
confidence: 99%
“…Besides, exploiting supplementary data, such as geotagged audios and multitemporal images, has been a new research direction. Hu et al [19] proposed to predict scene categories by transferring sound event knowledge learned from sound-image pairs. Ru et al [25] proposed a two-branch network to learn deep features of bitemporal images and fused them through a CorrFusion module for aerial scene classification.…”
Section: A Aerial Single-scene Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…Beyond videos, other interesting examples include VQA (see Direction 3, page 8), captioning [27] and audiovisual reasoning, i.e., linking remote sensing images to in-situ audio signals [28]. In the long run, we hope that reasoning Earth observation systems would be capable of deduce clues and make structural inference, in order to explain processes (see direction 5, page 13) and understand causal structures in Earth Systems (see direction 6, page 16).…”
Section: Perspectivesmentioning
confidence: 99%
“…Follow up works [2,33] further investigated to jointly learn the visual and audio representation using a visual-audio correspondence task. Instead of learning feature representations, recent works have also explored to localize sound source in images or videos [29,26,3,48,64], biometric matching [39], visual-guided sound source separation [64,15,19,60], auditory vehicle tracking [18], multi-modal action recognition [36,35,21], audio inpainting [66], emotion recognition [1], audio-visual event localization [56], multi-modal physical scene understanding [16], audio-visual co-segmentation [47], aerial scene recognition [27] and audio-visual embodied navigation [17].…”
Section: Audio-visual Learningmentioning
confidence: 99%