Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2297
|View full text |Cite
|
Sign up to set email alerts
|

Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

Abstract: Automatic speaker diarization techniques typically involve a two-stage processing approach where audio segments of fixed duration are converted to vector representations in the first stage. This is followed by an unsupervised clustering of the representations in the second stage. In most of the prior approaches, these two stages are performed in an isolated manner with independent optimization steps. In this paper, we propose a representation learning and clustering algorithm that can be iteratively performed … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 74 publications
(148 reference statements)
1
8
0
Order By: Relevance
“…This paper extends our previous works on self-supervised learning and graph-based clustering [15,16]. The previous works proposed representation learning and graph based clustering in an iterative self-supervised learning framework.…”
Section: Related Work and Contributionssupporting
confidence: 54%
See 2 more Smart Citations
“…This paper extends our previous works on self-supervised learning and graph-based clustering [15,16]. The previous works proposed representation learning and graph based clustering in an iterative self-supervised learning framework.…”
Section: Related Work and Contributionssupporting
confidence: 54%
“…In particular, both the embeddings and the adjacency matrix for graph based clustering are jointly learned. Using this joint learning, we show significant performance improvements over baseline systems and previous models based on self-supervised graph clustering methods [15,16].…”
Section: Introductionmentioning
confidence: 89%
See 1 more Smart Citation
“…This section describes the performance analysis of the proposed SDS‐HXLP‐DCNN‐SOA speaker diarization scheme depending on metrics, namely, tracking distance, FAR, DER. The performance is likened to the existing methods, such as real‐time implementation of speaker diarization system on Raspberry PI3 utilizing TLBO clustering algorithm (SDS‐RPi3‐TLBO), 35 deep self‐supervised hierarchical clustering for speaker diarization (SDS‐TDNN), 36 meta‐learning with latent space clustering in generative adversarial network for speaker diarization (SDS‐MCGAN), 37 A new method for speaker diarization system utilizing TMFCC parameterization with lion optimization (SDS‐TMFCC‐DNN‐LOA), 38 speaker diarization system utilizing HXLPS with deep neural network (SDS‐HXLP‐DNN), 39 respectively.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…Higher-performing offline algorithms such as k-means or spectral clustering can be used if the clustering is performed after all the points have been collected. Singh and Ganapathy [11] combined representation learning with agglomerative hierarchical clustering (AHC) for significant improvement over AHC alone. A fully-neural offline diarization system can also use unsupervised clustering [12].…”
Section: Introductionmentioning
confidence: 99%