2020
DOI: 10.48550/arxiv.2010.08737
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

Abstract: In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the robust similarity calculation between two videos, we first extract representative audio-based video descriptors by leveraging transfer learning based on a Convolutional Neural Network (CNN) trained on a large scale dataset of audio events, and then we calculate the similarity mat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [35] address multimedia event detection using only audio data, while [36] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [35] address multimedia event detection using only audio data, while [36] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [32] address multimedia event detection using only audio data, while [33] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”
Section: Related Workmentioning
confidence: 99%
“…After training, the learned metric shall return small distances among data from the same class and large distances among data from di↵erent classes. DML plays important roles in a diverse set of applications including image/video retrieval [188][189][190][191][192], person re-identification 42 3.1. Overview [193][194][195][196][197], vehicle re-identification [198,199], and self-supervised learning [200][201][202][203].…”
Section: Discussionmentioning
confidence: 99%