2020
DOI: 10.48550/arxiv.2002.06033
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 31 publications
0
8
0
Order By: Relevance
“…As proved in [25], in conventional training setting, mismatch of length between train and test speech can degrade performance for short utterance. In other words, as shown in [24], model trained with short segment performs better with short test utterance than model trained with relatively long speech, but with relatively poor performance with long speech. This tradeoff is very fatal to realistic settings, leading to not discriminative embedding of enrollment or test utterance.…”
Section: Meta-learning For Imbalance Length Pairmentioning
confidence: 82%
See 3 more Smart Citations
“…As proved in [25], in conventional training setting, mismatch of length between train and test speech can degrade performance for short utterance. In other words, as shown in [24], model trained with short segment performs better with short test utterance than model trained with relatively long speech, but with relatively poor performance with long speech. This tradeoff is very fatal to realistic settings, leading to not discriminative embedding of enrollment or test utterance.…”
Section: Meta-learning For Imbalance Length Pairmentioning
confidence: 82%
“…However, TDV shows good performance only in short length and relatively low performance in long utterance. In addition to these methods, there have been many attempts to solve this problem such as knowledge distillation [21], generative adversarial networks [22], angular margin-based method [23,24].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, deep neural network (DNN) and convolutional neural network (CNN) based speaker embedding systems have been applied to solve this problem and obtain effective performance improvements [9,10,11,12]. In [10], a raw waveform CNN-LSTM architecture was proposed to extract phoneticlevel features which can help compensate for missing phonetic information.…”
Section: Introductionmentioning
confidence: 99%