Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

Avgoustinakis, Pavlos; Kordopatis-Zilos, Giorgos; Papadopoulos, Symeon; Symeonidis, Andreas L.; Kompatsiaris, Ioannis

doi:10.48550/arxiv.2010.08737

Cited by 3 publications

(3 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [35] address multimedia event detection using only audio data, while [36] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”

Section: Related Workmentioning

confidence: 99%

Audio Retrieval with Natural Language Queries

Oncescu¹,

Koepke²,

Henriques³

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce challenging new benchmarks for text-based audio retrieval using text annotations sourced from the AUDIOCAPS and CLOTHO datasets. We then employ these benchmarks to establish baselines for cross-modal audio retrieval, where we demonstrate the benefits of pre-training on diverse audio tasks. We hope that our benchmarks will inspire further research into cross-modal text-based audio retrieval with free-form text queries.

show abstract

Section: Related Workmentioning

confidence: 99%

Audio Retrieval with Natural Language Queries

Oncescu¹,

Koepke²,

Henriques³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [32] address multimedia event detection using only audio data, while [33] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”

Section: Related Workmentioning

confidence: 99%

Audio Retrieval with Natural Language Queries

Oncescu¹,

Koepke²,

Henriques³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

show abstract

“…After training, the learned metric shall return small distances among data from the same class and large distances among data from di↵erent classes. DML plays important roles in a diverse set of applications including image/video retrieval [188][189][190][191][192], person re-identification 42 3.1. Overview [193][194][195][196][197], vehicle re-identification [198,199], and self-supervised learning [200][201][202][203].…”

Section: Discussionmentioning

confidence: 99%

AI-empowered promotional video generation

Liu¹

View full text Add to dashboard Cite

Professor Han Yu for his guidance and kind support throughout my Ph.D. study. This thesis would not have been possible without your insightful advice and fruitful discussion. I would also thank Prof Chunyan Miao, Prof Boyang Li, and Dr. Zhiqi Shen for their support and constructive conversations. It is a great honor for me to pursuit my Ph.D. under your guidance. I wish to thank my collaborators who worked on promotional video generation together and provided support along my Ph.D. journey:

show abstract

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

Cited by 3 publications

References 21 publications

Audio Retrieval with Natural Language Queries

Audio Retrieval with Natural Language Queries

Audio Retrieval with Natural Language Queries

AI-empowered promotional video generation

Contact Info

Product

Resources

About