Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1746
|View full text |Cite
|
Sign up to set email alerts
|

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

Abstract: In this paper, gating mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, a gated convolution neural network (GCNN) is employed for modeling the frame-level embedding layers. Compared with the time-delay DNN (TDNN), the GCNN can obtain more expressive frame-level representations through carefully designed memory cell and gating mechanisms. Moreover, we propose a novel gated-attention statistics pooling strategy in which the attention sco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 19 publications
0
11
0
Order By: Relevance
“…Recently, [15] proposed the usage of Gated Convolutional Neural Networks (GCNN) for speaker recognition. Matched with a gated-attention pooling method for frame-level feature aggregation, they evaluate the performance of GCNN in an x-vector [16] system on SRE16 and SRE18 datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, [15] proposed the usage of Gated Convolutional Neural Networks (GCNN) for speaker recognition. Matched with a gated-attention pooling method for frame-level feature aggregation, they evaluate the performance of GCNN in an x-vector [16] system on SRE16 and SRE18 datasets.…”
Section: Related Workmentioning
confidence: 99%
“…In [14] a Gaussian constrained training approach is proposed that impose on x-vectors to have a Gaussian distribution. Using gated convolutional layers instead of time delay layers and gated pooling layer has improved the performance of x-vector system [15]. In [16], a hybrid LSTM and CNN network used for frame level layers and by using multi-level pooling strategy and applying a regularization scheme on embedding layer, the performance of x-vector baseline system was improved.…”
Section: Introductionmentioning
confidence: 99%
“…By introducing deep learning into anti-spoofing, deep neural networks (DNN) have achieved promising results in anti-spoofing of ASVspoof 2017 [9]- [11] and ASVspoof 2019 [12]. ASV as a standalone task has also gained great improvement from deep learning [13], [14]. Given those achievements and in order to make ASV and anti-spoofing a step forward to practical usage, some early studies have proposed that a separately designed antispoofing system is implement before ASV, only the utterances which have passed spoofing detection are verified again by a ASV system [4], [14], which is in fact a cascaded structure as is illustrated in FIGURE 1.…”
Section: Introductionmentioning
confidence: 99%