Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-883
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

Abstract: In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Melfrequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 32 publications
(26 citation statements)
references
References 30 publications
0
26
0
Order By: Relevance
“…To obtain the final BN feature, the output from a hidden layer, a 1024 dimensional deep feature, is projected onto a 57 dimensional space to align with the dimension of the MFCC feature for a fair comparison. Allowing a higher dimension for BN can potentially boost the performance as observed in [4]. Deep features are normalized to zero mean and unit variance at utterance level before using principle component analysis (PCA) for dimension reduction.…”
Section: B the I-vector Methodsmentioning
confidence: 99%
“…To obtain the final BN feature, the output from a hidden layer, a 1024 dimensional deep feature, is projected onto a 57 dimensional space to align with the dimension of the MFCC feature for a fair comparison. Allowing a higher dimension for BN can potentially boost the performance as observed in [4]. Deep features are normalized to zero mean and unit variance at utterance level before using principle component analysis (PCA) for dimension reduction.…”
Section: B the I-vector Methodsmentioning
confidence: 99%
“…Recently, some adversarial training methods are introduced to extract noise invariant bottleneck features [64,188]. As shown in Figure 12, the adversarial network includes two parts, i.e., an encoding network (EN) which can extract noise invariant features and a discriminative network (DN) which can judge noise types of the noise invariant feature generated from EN.…”
Section: Speech Recognition and Verification For The Internet Ofmentioning
confidence: 99%
“…As shown in Figure 12, the adversarial network includes two parts, i.e., an encoding network (EN) which can extract noise invariant features and a discriminative network (DN) which can judge noise types of the noise invariant feature generated from EN. Therefore, we can get robustness noise invariant features from EN which can improve the performance of speaker verification system by adversarial training these two parts in turn [64,188].…”
Section: Speech Recognition and Verification For The Internet Ofmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, the speaker representation models have moved from the commonly used i-vector model [1,2,3], with a probabilistic linear discriminant (PLDA) back-end [4,5] to a new paradigm: speaker embedding trained from deep neural networks. Various speaker embeddings based on different network architectures [6,7] , attention mechanism [8,9], loss functions [10,11], noise robustness [12,13], and training paradigms [14,15] have been proposed and greatly improve the performance of speaker verification systems. Snyder et al [6] recently proposed the x-vector model, which is based on a Time-Delay Deep Neural Network (TDNN) architecture that computes speaker embeddings from variable-length acoustic segments.…”
Section: Introductionmentioning
confidence: 99%