Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1159
|View full text |Cite
|
Sign up to set email alerts
|

The IBM Speaker Recognition System: Recent Advances and Error Analysis

Abstract: We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (as opposed to LDA) for intersession variability compensation in the i-vector space, the application of speaker and channel-adapted features derived from an automatic speech recognition (ASR) system for speaker recognition, and the use of a DNN acoustic model with a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…In speaker recognition, i-Vector/PLDA systems achieve very impressive performances in the presence of noise and channel variabilities [1,2,3,4]. However, these systems often rely on a large collection of in-domain and well-annotated data, e.g., transcriptions for ASR DNN acoustic modeling and speaker labels for PLDA training [5,6].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In speaker recognition, i-Vector/PLDA systems achieve very impressive performances in the presence of noise and channel variabilities [1,2,3,4]. However, these systems often rely on a large collection of in-domain and well-annotated data, e.g., transcriptions for ASR DNN acoustic modeling and speaker labels for PLDA training [5,6].…”
Section: Introductionmentioning
confidence: 99%
“…UT-SCOPE-physical speech corpus is employed as the indomain data [19]. Instead of focusing on duration, noise or channel variabilities like the conventional NIST SREs [3,20], the corpus addresses more on the variability introduced by speakers themselves (i.e., intrinsic neutral/physical stressed mismatch). To create the mismatched conditions, neutral-read speech, stressed-read speech and stressed-spontaneous speech are collected from each speaker.…”
Section: Introductionmentioning
confidence: 99%