Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2168
|View full text |Cite
|
Sign up to set email alerts
|

Variational Domain Adversarial Learning for Speaker Verification

Abstract: Domain mismatch refers to the problem in which the distribution of training data differs from that of the test data. This paper proposes a variational domain adversarial neural network (VDANN), which consists of a variational autoencoder (VAE) and a domain adversarial neural network (DANN), to reduce domain mismatch. The DANN part aims to retain speaker identity information and learn a feature space that is robust against domain mismatch, while the VAE part is to impose variational regularization on the learne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(24 citation statements)
references
References 22 publications
0
24
0
Order By: Relevance
“…Note that the posterior for the spoof class is 1 − r ψ (x) as there are two classes. Inspired by [38] and [44], we consider two different AC setups. First, following [38], we use the mean µ z as the input to an AC which is a feedforward neural network with a single hidden layer.…”
Section: Conditioning Vaes By Class Labelmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that the posterior for the spoof class is 1 − r ψ (x) as there are two classes. Inspired by [38] and [44], we consider two different AC setups. First, following [38], we use the mean µ z as the input to an AC which is a feedforward neural network with a single hidden layer.…”
Section: Conditioning Vaes By Class Labelmentioning
confidence: 99%
“…Inspired by [38] and [44], we consider two different AC setups. First, following [38], we use the mean µ z as the input to an AC which is a feedforward neural network with a single hidden layer. Second, following [44], we augment a deep-CNN as an AC to the output of the decoder network.…”
Section: Conditioning Vaes By Class Labelmentioning
confidence: 99%
“…A promising research direction in this context is domain adversarial training to make speaker representations robust to recording conditions [12][13][14][15]. However, a majority of these techniques are supervised, i.e., they require labelled nuisance factors, which might not be readily available in many real-world scenarios.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work in adversarial learning of speaker representation has encouraged domain invariance by having an adversary classify the dataset or labelled environment to which the generated features belong [4,12]. However, this is a coarse modelling of the domains over which generated features are encouraged to be invariant.…”
Section: Introductionmentioning
confidence: 99%
“…However, this is a coarse modelling of the domains over which generated features are encouraged to be invariant. In the case of dataset adversarial training [12], for instance, intra-dataset variation is not penalized, instead relying on the differences between datasets being enough to encourage meaningful invariance.…”
Section: Introductionmentioning
confidence: 99%