2012
DOI: 10.1109/tasl.2011.2125954
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Diarization: A Review of Recent Research

Abstract: Abstract-Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
375
1
3

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 570 publications
(379 citation statements)
references
References 87 publications
0
375
1
3
Order By: Relevance
“…This error is referred to as speech time error in the results computed by NIST tools 1 . Others [13] choose to report the FA speaker and Miss speaker errors inclusive of overlap, e.g. a segment which contains two speakers that has been completely missed by the system will have twice the error.…”
Section: Diarization Error Ratementioning
confidence: 99%
“…This error is referred to as speech time error in the results computed by NIST tools 1 . Others [13] choose to report the FA speaker and Miss speaker errors inclusive of overlap, e.g. a segment which contains two speakers that has been completely missed by the system will have twice the error.…”
Section: Diarization Error Ratementioning
confidence: 99%
“…The task of speaker diarisation is an important prerequisite task for audio indexing, automatic speech recognition (ASR) and more [1,2]. The objective is to split the audio into segments which are associated with a single speaker, and to identify among the set of segments those that are spoken by the same speaker.…”
Section: Introductionmentioning
confidence: 99%
“…Speaker diarization (the process of partitioning an input audio stream into homogeneous segments according to speaker identity), when used together with speaker recognition (the identification of speakers by their voices), has become an important key technology for tasks such as navigation, retrieval, and high-level inference from audio data in meeting recordings. Some speaker diarization systems integrate motion and gazing data analyses with audio data analysis to achieve higher accuracy and robustness (Anguera et al 2012;Moattar and Homayounpour 2012). There are also meeting systems that use multimodal data including both motion and gaze (Hain et al 2010;Tur et al 2008).…”
Section: Introductionmentioning
confidence: 99%