2023
DOI: 10.1016/j.jag.2022.103130
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised audiovisual representation learning for remote sensing data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 53 publications
0
12
0
Order By: Relevance
“…The primary purpose of data fusion in multi-view (MV) learning is to combine the information from different perspectives (views) to provide a broader understanding of the phenomena and improve the predictive performance of machine learning models [28]. However, sometimes the goal could be just to get an embedding to search for similar views, as is the case of MV alignment or representation learning [8], [29], [30]. This alignment is the base of contrastive learning [27], where a model is used to project the data into a shared subspace for each view.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The primary purpose of data fusion in multi-view (MV) learning is to combine the information from different perspectives (views) to provide a broader understanding of the phenomena and improve the predictive performance of machine learning models [28]. However, sometimes the goal could be just to get an embedding to search for similar views, as is the case of MV alignment or representation learning [8], [29], [30]. This alignment is the base of contrastive learning [27], where a model is used to project the data into a shared subspace for each view.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Here, we compare the results of four schemes, ADVANCE [66], SoundingEarth [68], attention + CNN, and AiT, to demonstrate the effectiveness of the schemes based only on Transformer. Among them, the acoustic characteristics of the four schemes are common (use the log-Mel spectrogram).…”
Section: Audio Experimentsmentioning
confidence: 99%
“…Here, we compare the results of ADVANCE [66], SoundingEarth [68], and our twostage hybrid fusion strategy and show the quantitative results of different fusion strategies on three metrics (precision, recall, F1-score). The ADVANCE scheme transfers sound event knowledge to aerial scene recognition tasks to improve scene recognition performance.…”
Section: Audiovisual Experimentsmentioning
confidence: 99%
“…Self-supervised learning approaches in earth observation focus on learning representation from unlabeled data as pretraining following contrastive [26]- [30] and generative [31]- [33] approaches. Learned representations can be used for downstream tasks supported by fine-tuning with curated labels in scene classification [26], [27], [29], [31], [32] and semantic segmentation [28]- [30] tasks. This two-step strategy is also leaning toward semi-supervision and knowledge distillation [34].…”
Section: Introductionmentioning
confidence: 99%