2023
DOI: 10.48550/arxiv.2303.11302
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Abstract: Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only the audio and visual contents from the same video are positive samples for each other. However, this assumption would suffer from false negative samples in real-world training. For example, for an audio sample, treating the frames from the same audio class as negative samples may mislead … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 28 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?