Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1717
|View full text |Cite
|
Sign up to set email alerts
|

Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR

Abstract: Discriminative autoencoders (DcAEs) have been proven to improve generalization of the learned acoustic models by increasing their reconstruction capacity of input features from the frame embeddings. In this paper, we integrate DcAEs into two models, namely TDNNs and LSTMs, which have been commonly adopted in the Kaldi recipes for LVCSR in recent years, using the modified nnet3 neural network library. We also explore two kinds of skip-connection mechanisms for DcAEs, namely concatenation and addition. The resul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 29 publications
0
4
0
Order By: Relevance
“…The sequence-to-sequence (S2S) approach [1] has achieved remarkable results in automatic speech recognition (ASR), in particular, large vocabulary continuous speech recognition (LVCSR) [2][3][4][5][6][7][8]. Unlike the conventional hybrid ASR, S2S requires neither lexicons, nor prerequisite models or decision trees.…”
Section: Introductionmentioning
confidence: 99%
“…The sequence-to-sequence (S2S) approach [1] has achieved remarkable results in automatic speech recognition (ASR), in particular, large vocabulary continuous speech recognition (LVCSR) [2][3][4][5][6][7][8]. Unlike the conventional hybrid ASR, S2S requires neither lexicons, nor prerequisite models or decision trees.…”
Section: Introductionmentioning
confidence: 99%
“…where α and β are weighting factors. In previous studies in [29,30,31], only the CE and MSE were jointly optimized in AM training. In our experiments, we set α to 5 as in most previous studies and heuristically set β to 0.2.…”
Section: Final Objective Functionmentioning
confidence: 99%
“…To import the advantage of unsupervised learning in [15], [16] into our acoustic modeling, the DNN model for generating the emission probabilities of the output labels can be regarded as the encoder, and the decoder is used to reconstruct the acoustic features, as shown in Figure 2. The output of the penultimate layer of the encoder is supposed to be a pure phoneme-related vector without any information irrelevant to the phonetic content.…”
Section: Proposed Model a Filtering Mechanism For Acoustic Modelingmentioning
confidence: 99%
“…In [15], [16], discriminative autoencoder-based (DcAE) acoustic modeling was proposed to separate the acoustic feature into the components of phoneme, speaker and environmental noise. Such model-space innovation takes a great advantage of unsupervised learning to extract pure phonetic components from the acoustic feature to better recognize speech.…”
Section: Introductionmentioning
confidence: 99%