2012
DOI: 10.1587/transinf.e95.d.2469
|View full text |Cite
|
Sign up to set email alerts
|

Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model

Abstract: SUMMARYA novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…This algorithm can be seen as an online version of the VBx [27] with simplified prior on assignments p(Z). It is also similar to the algorithm from [7], where the authors modified the offline variational inference to make it suitable for online processing.…”
Section: Model-based Clusteringmentioning
confidence: 99%
See 1 more Smart Citation
“…This algorithm can be seen as an online version of the VBx [27] with simplified prior on assignments p(Z). It is also similar to the algorithm from [7], where the authors modified the offline variational inference to make it suitable for online processing.…”
Section: Model-based Clusteringmentioning
confidence: 99%
“…Another example is online speaker diarization or clustering [4][5][6][7][8][9]. In this case, short speech segments from an audio stream have to be classified with low latency (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…A number of online diarization [11,12,13,14,15] and speaker tracking [16,17] solutions have been reported. These use online speaker clustering algorithms [18,19]. Only speaker tracking systems assume prior knowledge of target speakers but they do not consider latency.…”
Section: Prior Workmentioning
confidence: 99%
“…We first considered an approach of using a video recognition technology to summarize a movie. In audio and video recognition technology, there are studies of "speaker clustering" to classify speeches to each speaker [24], and "speaker indexes" to capture "who spoke, and when" [25]. Speaker clustering is a technique that classifies speech utterances from multiple speakers in broadcast news, and meetings, etc.…”
Section: Related Workmentioning
confidence: 99%