ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414017
|View full text |Cite
|
Sign up to set email alerts
|

Towards Robust Speaker Verification with Target Speaker Enhancement

Abstract: This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV). Specifically, an enrollment speaker conditioned speech enhancement module is employed as the front-end for extracting target speaker from its mixture with interfering speakers and environmental noises. Compared with the conventional target speaker enhancement models, nontarget speake… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 28 publications
0
12
0
Order By: Relevance
“…However, this is beyond the scope of this study and is left for future work. Figures 4,5,6, and 7 illustrate the EERs of the PLDA and GLASSO-PLDA on the evaluation trials, where the enrollment utterances were clean, whereas the test utterances had one of six conditions (i.e., two noise types: bus and cafe × three SNRs: 0, 5, and 10 dB). The black dotted vertical In Part 1, the average relative EER reductions in the evaluation trials were 4.1450% (0.74% to 9.16%) with the d-vector (see the r-vector (see Table 2).…”
Section: Results and Discussion 1) Demonstration Of The Conditional I...mentioning
confidence: 99%
See 1 more Smart Citation
“…However, this is beyond the scope of this study and is left for future work. Figures 4,5,6, and 7 illustrate the EERs of the PLDA and GLASSO-PLDA on the evaluation trials, where the enrollment utterances were clean, whereas the test utterances had one of six conditions (i.e., two noise types: bus and cafe × three SNRs: 0, 5, and 10 dB). The black dotted vertical In Part 1, the average relative EER reductions in the evaluation trials were 4.1450% (0.74% to 9.16%) with the d-vector (see the r-vector (see Table 2).…”
Section: Results and Discussion 1) Demonstration Of The Conditional I...mentioning
confidence: 99%
“…Recently, many studies have been conducted to develop various types of noise-robust ASV systems [4], [5], [6], [7], [8]. We address noise robustness based on probabilistic linear discriminant analysis (PLDA) [9], [10].…”
Section: Introductionmentioning
confidence: 99%
“…For instance, visual voice activity detection [5] might alleviate this issue. However, it is more challenging with audio clues [57], and further research may be required. 21 The results of the CHiME 6 challenge can be found at: https:// chimechallenge.github.io/chime6/results.html.…”
Section: A Deployment Of Tse Systemsmentioning
confidence: 99%
“…The first one is the Voxceleb corpus, which is our primary benchmark dataset to evaluate the proposed systems. The second one is our internal SV dataset, the test set is collected under the unconstrained vehicle environments in Mandarin (noted as "vehicle-spk") [7], the details of the corpora are listed below. Voxceleb: we use Voxceleb1 and Voxceleb2 training data for training the models.…”
Section: C3-dino Speaker Embedding Systemmentioning
confidence: 99%