Interspeech 2012 2012
DOI: 10.21437/interspeech.2012-556
|View full text |Cite
|
Sign up to set email alerts
|

Event-based video retrieval using audio

Abstract: Multimedia Event Detection (MED) is an annual task in the NIST TRECVID evaluation, and requires participants to build indexing and retrieval systems for locating videos in which certain predefined events are shown. Typical systems focus heavily on the use of visual data. Audio data, however, also contains rich information that can be effectively used for video retrieval, and MED could benefit from the attention of researchers in audio analysis. We present several systems for performing MED using only audio dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 46 publications
(15 citation statements)
references
References 7 publications
0
15
0
Order By: Relevance
“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [35] address multimedia event detection using only audio data, while [36] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, [4] use a siamese neural network framework to learn to encode semantically similar audio close together in the embedding space. [35] address multimedia event detection using only audio data, while [36] tackle near-duplicate video retrieval by audio retrieval. These are purely audio-based methods that are applied to video datasets, but without using visual information.…”
Section: Related Workmentioning
confidence: 99%
“…Sound event detection (SED), in which the types of sound event are identified and their onset and offset in an audio recording are estimated, is one of the principal tasks in environmental sound analysis [1,2]. Recently, many works have addressed SED because it plays an important role in realizing various applications using artificial intelligence in sounds, e.g., automatic life logging, machine monitoring, automatic surveillance, media retrieval, and biomonitoring systems [3,4,5,6,7,8].…”
Section: Introductionmentioning
confidence: 99%
“…Sound event detection (SED) is the task of detecting sound event labels and their onset/offset in an audio recording, where a sound event indicates a type of sound such as "people talking" and "bird singing" [1]. SED plays an important role in realizing various applications using artificial intelligence in sounds, such as automatic life-logging, machine monitoring, automatic surveillance, media retrieval, and biomonitoring systems [2][3][4][5][6][7][8].…”
Section: Introductionmentioning
confidence: 99%