Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1410
|View full text |Cite
|
Sign up to set email alerts
|

Target Speaker Extraction for Multi-Talker Speaker Verification

Abstract: The performance of speaker verification degrades significantly when the test speech is corrupted by interference from nontarget speakers. Speaker diarization separates speakers well only if the speakers are not overlapped. However, if multiple talkers speak at the same time, we need a technique to separate the speech in the spectral domain. In this paper, we study a way to extract the target speaker's speech from an overlapped multi-talker speech. Specifically, given some reference speech samples from the targ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
28
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(28 citation statements)
references
References 22 publications
0
28
0
Order By: Relevance
“…We also report three competitive baselines (System 2, 11, and 18) that follow the target speaker extraction-verification (TSE-SV) pipeline [31], where speaker extraction and speaker verification modules are trained separately. Between the zeroeffort baselines and competitive baselines, in particular, between System 17 and 18, we observe the followings: (1) The target speaker extraction front-end greatly improves the SV performance under multi-talker test condition; (2) Among System 2, 11 and 18, the frequency-domain SV systems (SV-F and SV-FA) appear to be more robust than the time-domain counterpart (SV-T).…”
Section: Evaluating Target Speaker Embeddings On Wsj0-2talker Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…We also report three competitive baselines (System 2, 11, and 18) that follow the target speaker extraction-verification (TSE-SV) pipeline [31], where speaker extraction and speaker verification modules are trained separately. Between the zeroeffort baselines and competitive baselines, in particular, between System 17 and 18, we observe the followings: (1) The target speaker extraction front-end greatly improves the SV performance under multi-talker test condition; (2) Among System 2, 11 and 18, the frequency-domain SV systems (SV-F and SV-FA) appear to be more robust than the time-domain counterpart (SV-T).…”
Section: Evaluating Target Speaker Embeddings On Wsj0-2talker Datasetmentioning
confidence: 99%
“…The idea of speaker extraction (SE) followed by speaker verification, i.e., SE-SV [31] pipeline, was previously studied to address speaker verification for multi-talker speech. The SE-SV system extracts the speech of the target speaker in the first stage, and subsequently processes the extracted speech with a standard speaker verification module, such as i-vector PLDA [1]- [3].…”
Section: Introductionmentioning
confidence: 99%
“…Integrating target speaker enhancement to robust SV is still at its early stage. The most relevant work to us is [15], where speakerbeam based target speaker enhancement is evaluated [19], and an i-vector model is employed as the SV model. In contrast with the pipeline system [15], we expect an all-neural network solution with joint training leads to improved SV performance.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work that consider overlapped speech for the SV task are limited. A single-channel front-end extractor was proposed in [11] to extract the target speaker's speech based on the enrollment speaker information, which greatly improved the performance of the GMM i-vector system in a pipeline mode.…”
Section: Introductionmentioning
confidence: 99%