1999
DOI: 10.1007/s005300050106
|View full text |Cite
|
Sign up to set email alerts
|

An overview of audio information retrieval

Abstract: The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an "AltaVista" for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and mu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
107
0
1

Year Published

2006
2006
2015
2015

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 265 publications
(108 citation statements)
references
References 46 publications
0
107
0
1
Order By: Relevance
“…[11] is invented by the Computer Vision community to retrieve multimedia objects based on low-level features that can be automatically extracted from the objects. In the past decade, a large number of CBR systems have been built for image retrieval (such as QBIC [9], MARS [25], and VisualSEEK [29], as overviewed in [28]), video retrieval (such as Informedia [13], VideoQ [5]), and audio retrieval [10]. The low-level features used in CBR techniques vary from one type of multimedia to another, ranging from keywords for texts, color and texture for images, and pitch and melody for audios.…”
Section: Connection With Previous Workmentioning
confidence: 99%
“…[11] is invented by the Computer Vision community to retrieve multimedia objects based on low-level features that can be automatically extracted from the objects. In the past decade, a large number of CBR systems have been built for image retrieval (such as QBIC [9], MARS [25], and VisualSEEK [29], as overviewed in [28]), video retrieval (such as Informedia [13], VideoQ [5]), and audio retrieval [10]. The low-level features used in CBR techniques vary from one type of multimedia to another, ranging from keywords for texts, color and texture for images, and pitch and melody for audios.…”
Section: Connection With Previous Workmentioning
confidence: 99%
“…Showing all these segments on MeetingTree as separate audio nodes would limit the usefulness of the tool as it is not possible to have a full view of the meeting tree on the screen. This is a classic problem for visualisation of large volumes of data, and in audio browsing in particular [5]. We are currently exploring various strategies for reducing the number of speech segments without compromising the integrity and structure of the recording.…”
Section: Combining Speech Segmentsmentioning
confidence: 99%
“…A great deal of research in automatic indexing of and access to time-based media has focused on the issue of translating these media from their typical transient and sequential form into a parallel and persistent presentation [4,5,6]. Descriptors thus extracted can then be used in conjunction with existing text retrieval techniques to provide content-based indexing.…”
Section: Introductionmentioning
confidence: 99%
“…One of the important cues for human situational awareness is sound, yet automated sound recognition remains a challenging and little understood problem [2,3,4,5].…”
Section: Introductionmentioning
confidence: 99%