2020
DOI: 10.1007/s00521-020-04793-y
|View full text |Cite
|
Sign up to set email alerts
|

Robust features for text-independent speaker recognition with short utterances

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 62 publications
0
7
0
Order By: Relevance
“…However, the performance of the state-of-the-art speaker recognition systems dramatically deteriorates when short utterances are used for training/testing particularly with low SNR [ 29 , 30 ]. In spite of the fact that deep neural networks (DNNs) provide a state-of-the-art tool for acoustic modeling, DNNs are data sensitive, and limited speech data as well as data-mismatch problems can deteriorate their performance [ 4 , 5 , 31 ].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the performance of the state-of-the-art speaker recognition systems dramatically deteriorates when short utterances are used for training/testing particularly with low SNR [ 29 , 30 ]. In spite of the fact that deep neural networks (DNNs) provide a state-of-the-art tool for acoustic modeling, DNNs are data sensitive, and limited speech data as well as data-mismatch problems can deteriorate their performance [ 4 , 5 , 31 ].…”
Section: Related Workmentioning
confidence: 99%
“…However, deep learning systems require huge speech databases to be labeled and trained; theses databases also need to include phonetically rich sentences or at least phonetically balanced sentences [ 31 ]. In addition, most of speaker recognition systems that were developed based on deep learning techniques have been applied to text-dependent speaker verification tasks [ 4 ]. Hence, training deep learning systems on limited data is a difficult task and may not necessarily lead to speaker recognition systems with state-of-the-art performance.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The ubiquitous noise masks or submerges the voice, changing the auditory, spectral, and other acoustic features. These changes increase the challenge of acquiring the target speaker's characteristics for speech processing, thereby reducing the performance of voice identity recognition [8]. Some methods have been extensively used to compensate mismatched voice identification in noisy environments, such as a multiscale chaotic feature [9], signal enhancement [10], the model compensation method [11], and score normalization [12].…”
Section: Introductionmentioning
confidence: 99%
“…Besides noise, the duration of utterances is a factor of high importance for the accuracy. [17] described the use of gammatone features in combination with i-vector on short utterances, [18] trained deep convolutional networks specifically on short utterances. [1] described the use of inception networks by transforming utterances to fixed length spectrograms and training via triplet loss.…”
Section: Introductionmentioning
confidence: 99%