2021
DOI: 10.1109/taslp.2021.3100682
|View full text |Cite
|
Sign up to set email alerts
|

Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech

Abstract: Speaker verification has been studied mostly under the single-talker condition. It is adversely affected in the presence of interference speakers. Inspired by the study on target speaker extraction, e.g., SpEx, we propose a unified speaker verification framework for both single-and multi-talker speech, that is able to pay selective auditory attention to the target speaker. This target speaker verification (tSV) framework jointly optimizes a speaker attention module and a speaker representation module via multi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(5 citation statements)
references
References 42 publications
0
5
0
Order By: Relevance
“…The room impulse responses (RIRs) in MC-Libri2Mix is simulated using pyroomacoustics 2 package. For the room configurations, the length and width of each room are randomly drawn in the range [5,10]m, and the height is selected in the range [3,4]m. The reverberation time (RT 60 ) of the reverberant data ranges from 200ms to 600ms.…”
Section: Experiments and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The room impulse responses (RIRs) in MC-Libri2Mix is simulated using pyroomacoustics 2 package. For the room configurations, the length and width of each room are randomly drawn in the range [5,10]m, and the height is selected in the range [3,4]m. The reverberation time (RT 60 ) of the reverberant data ranges from 200ms to 600ms.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…* Corresponding author. never stop seeking the engineering solution to confer human's selective attention capability on machines, as there is high demand for various real-world applications, such as speech recognition [2,3] and speaker verification [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…Equal error rates (EER) and detection cost function (DCF) are applied to evaluate the performance of speaker verification systems ( Xu et al, 2021 ). The evaluation metrics EER and DCF refer to two parameters, which are False Acceptation Rate (FAR) and False Rejection Rate (FFR).…”
Section: Methodsmentioning
confidence: 99%
“…Besides the works on feature learning (extraction), many efforts were made on the construction of back-end classifiers for SV. The typical classifiers mainly included the Cosine Distance (CD) [1], Probabilistic Linear Discriminant Analysis (PLDA) [36], [37], and deep neural network [38].…”
Section: Related Workmentioning
confidence: 99%