Stephen Shum scite author profile

Abstract-In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.

show abstract

Improving On-Device Speaker Verification Using Federated Learning with Privacy

Granqvist¹,

Seigel²,

Dalen³

et al. 2020

View full text Add to dashboard Cite

Information on speaker characteristics can be useful as side information in improving speaker recognition accuracy. However, such information is often private. This paper investigates how privacy-preserving learning can improve a speaker verification system, by enabling the use of privacy-sensitive speaker data to train an auxiliary classification model that predicts vocal characteristics of speakers. In particular, this paper explores the utility achieved by approaches which combine different federated learning and differential privacy mechanisms. These approaches make it possible to train a central model while protecting user privacy, with users' data remaining on their devices. Furthermore, they make learning on a large population of speakers possible, ensuring good coverage of speaker characteristics when training a model. The auxiliary model described here uses features extracted from phrases which trigger a speaker verification system. From these features, the model predicts speaker characteristic labels considered useful as side information. The knowledge of the auxiliary model is distilled into a speaker verification system using multi-task learning, with the side information labels predicted by this auxiliary model being the additional task. This approach results in a 6 % relative improvement in equal error rate over a baseline system.

show abstract

Generalised Discriminative Transform via Curriculum Learning for Speaker Recognition

Marchi

Shum

Hwang

et al. 2018

View full text Add to dashboard Cite

Large-scale community detection on speaker content graphs

Shum¹,

Campbell

Reynolds

2013

View full text Add to dashboard Cite

We consider the use of community detection algorithms to perform speaker clustering on content graphs built from large audio corpora. We survey the application of agglomerative hierarchical clustering, modularity optimization methods, and spectral clustering as well as two random walk algorithms: Markov clustering and Infomap. Our results on graphs built from the NIST 2005+2006 and 2008+2010 Speaker Recognition Evaluations (SREs) provide insight into both the structure of the speakers present in the data and the intricacies of the clustering methods. In particular, we introduce an additional parameter to Infomap that improves its clustering performance on all graphs. Lastly, we also develop an automatic technique to purify the neighbors of each node by pruning away unnecessary edges.

show abstract

The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition

Sturim

Campbell

Dehak³

et al. 2011

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stephen Shum

Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

Improving On-Device Speaker Verification Using Federated Learning with Privacy

Generalised Discriminative Transform via Curriculum Learning for Speaker Recognition

Large-scale community detection on speaker content graphs

The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition

Contact Info

Product

Resources

About