2012
DOI: 10.1109/tasl.2011.2162320
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Speaker Diarization for Long Recordings and Small Collections

Abstract: Performing speaker diarization of very long recordings is a problem for most diarization systems that are based on agglomerative clustering with an HMM topology. Performing collectionwide speaker diarization, where each speaker is identified uniquely across the entire collection, is even a more challenging task. In this paper we propose a method with which it is possible to efficiently perform diarization of long recordings. We have also applied this method successfully to a collection of a total duration of a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 15 publications
0
17
0
Order By: Relevance
“…Each node is initialized as its own cluster and iteratively merged with other clusters via some similarity metric -we found the unweighted group average affinity to perform best [11] -until some stopping criterion (e.g., BIC [7,10], maximum distance [8], number of clusters, etc.) is met.…”
Section: Graph Clustering Algorithms Agglomerative Hierarchical Clustmentioning
confidence: 99%
See 1 more Smart Citation
“…Each node is initialized as its own cluster and iteratively merged with other clusters via some similarity metric -we found the unweighted group average affinity to perform best [11] -until some stopping criterion (e.g., BIC [7,10], maximum distance [8], number of clusters, etc.) is met.…”
Section: Graph Clustering Algorithms Agglomerative Hierarchical Clustmentioning
confidence: 99%
“…The related problem of largescale speaker diarization was also explored for long recordings and small collections in [8]. Both works solely consider the use of agglomerative hierarchical clustering; neither consider the community detection algorithms or the sparse graph structure we investigate in this paper.…”
Section: Introductionmentioning
confidence: 99%
“…in a multispeaker audio stream [1]. Some of the practical applications of diarization technology include information retrieval [2], broadcast news, meeting conversations, telephone calls, VoIP, digital audio logging [3] and interaction analysis in Peer-Led Team Learning (PLTL) groups [4,5,6,7]. Diarization is a challenging task for naturalistic audio streams as they contain short conversational turns, overlapped speech, noise and reverberation [8,9].…”
Section: Introductionmentioning
confidence: 99%
“…Cluster and speaker purity measures are given to compare MAP and JFA approaches to adaptation. Targetting large scale speaker diarization is [5], which proposes a multi-stage system involving speaker diarization followed by speaker linking of chunks of speech data. Although splitting the database in small chunks increases diarization error rates, this system scales particurlarly well on large data sets.…”
Section: Introductionmentioning
confidence: 99%
“…Although splitting the database in small chunks increases diarization error rates, this system scales particurlarly well on large data sets. Only the work in [5] focuses on interview and meeting data, the others targetting telephone speech conversations between two people. In our work, we target a challenging scenario with meetings of 4 participants each recorded using various types of far-field microphones and several recording rooms.…”
Section: Introductionmentioning
confidence: 99%