2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003794
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Neural Network Embeddings Using a Pair-Wise Loss for Text-Independent Speaker Verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…The recent deep learning based speaker verification approaches can be primarily categorized into two main aspects: advanced network structure constructions [1,2,3,4,18] and effective loss function designs [6,19,20,21].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent deep learning based speaker verification approaches can be primarily categorized into two main aspects: advanced network structure constructions [1,2,3,4,18] and effective loss function designs [6,19,20,21].…”
Section: Related Workmentioning
confidence: 99%
“…Logistic affinity loss [19] instead optimizes an end-to-end speaker verification model by building a learnable decision boundary to distinguish the similar pairs and dissimilar pairs. The quartet loss [21] explicitly computes a pair-wise distance loss in the embedding space and increases the gap between the similarity score distributions between the same class pairs and different class pairs. Self-adaptive soft voice activity detection (VAD) [31] incorporates a deep neural network based VAD into a deep speaker embedding system to reduce the domain mismatch.…”
Section: Related Workmentioning
confidence: 99%
“…I-vectors are statistical low-dimensional representations over the distributions of spectral features, and are commonly used in state-of-the-art speaker recognition systems [31] and age estimation systems [32], [33]. Respectively, 400-dimensional and 600-dimensional i-vectors are extracted for Fisher and SRE datasets using the state-of-the-art speaker identification system [34].…”
Section: A Datamentioning
confidence: 99%
“…In the training process of CNN, depth metric learning such as pair-wise loss, 9,10 triplet loss, 11,12 and n-pair loss. 13 Above methods are especially applied to FGIR to extract more accurate features on the tiny differ-ences between objects.…”
Section: Introductionmentioning
confidence: 99%