2006
DOI: 10.1109/tasl.2006.878256
|View full text |Cite
|
Sign up to set email alerts
|

An overview of automatic speaker diarization systems

Abstract: Abstract-Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
300
2
7

Year Published

2007
2007
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 511 publications
(329 citation statements)
references
References 28 publications
1
300
2
7
Order By: Relevance
“…[2,3], although top-down systems also achieve competitive results [4]. While some have reported that bottom-up approaches are more robust than their top-down counterparts [5] our own work [6] shows that the two approaches give comparable results, with neither being consistently superior to the other.…”
Section: Introductionmentioning
confidence: 86%
“…[2,3], although top-down systems also achieve competitive results [4]. While some have reported that bottom-up approaches are more robust than their top-down counterparts [5] our own work [6] shows that the two approaches give comparable results, with neither being consistently superior to the other.…”
Section: Introductionmentioning
confidence: 86%
“…This paper simply defines TDOA as the TDOA between the first channel X 1 (ω, f ) and others 1 . Finally, TDOA of the m-th channel in the f -th frame τ m (f ) is computed as follows:…”
Section: A Estimation Of Number Of Sound Sources 1) Tdoa Estimationmentioning
confidence: 99%
“…Speaker diarization research mainly tackles the simultaneous estimation of speaker segmentation (voice activity detection) and clustering (number of speaker estimation). Beside monaural signal based methods [1], [2], microphone array technologies tackles this by introducing spatial information about the speakers. However, most of the existing methods assume that the microphone location is given to estimate the direction of arrival of speakers [3]- [6].…”
Section: Introductionmentioning
confidence: 99%
“…The class types considered may vary by application but a typical partitioning might distinguish pure music, pure speech, noise, combined speech and music and combined speech and noise (Tranter & Reynolds, 2006). The resultant partitioning may provide useful metadata for the purpose of flexible access, but such partitioning is also an important prerequisite for speech-to-text transcription systems (e.g.…”
Section: Automatic Annotationmentioning
confidence: 99%