2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003860
|View full text |Cite
|
Sign up to set email alerts
|

Joint Optimization of Classification and Clustering for Deep Speaker Embedding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…The recent deep learning based speaker verification approaches can be primarily categorized into two main aspects: advanced network structure constructions [1,2,3,4,18] and effective loss function designs [6,19,20,21].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent deep learning based speaker verification approaches can be primarily categorized into two main aspects: advanced network structure constructions [1,2,3,4,18] and effective loss function designs [6,19,20,21].…”
Section: Related Workmentioning
confidence: 99%
“…Various loss functions have been studied for speaker verification. Wang et al [20] jointly optimizes classification and clustering with a large margin softmax loss and a large margin Gaussian mixture loss. Logistic affinity loss [19] instead optimizes an end-to-end speaker verification model by building a learnable decision boundary to distinguish the similar pairs and dissimilar pairs.…”
Section: Related Workmentioning
confidence: 99%
“…In the first, the network is trained as a multi-class classification network that classifies a large number of classes (speakers in our case). These networks use objectives that include, in addition to traditional classification losses, augmentations that are intended to encourage enhanced within-class clustering of embeddings [3,24,25,26,27,28,29] along with increased separation of embeddings of instances from different classes. The expectation is that this behavior will generalize to data outside the training set.…”
Section: Introductionmentioning
confidence: 99%