Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-221
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative Autoencoders for Acoustic Modeling

Abstract: Speech data typically contain information irrelevant to automatic speech recognition (ASR), such as speaker variability and channel/environmental noise, lurking deep within acoustic features. Such unwanted information is always mixed together to stunt the development of an ASR system. In this paper, we propose a new framework based on autoencoders for acoustic modeling in ASR. Unlike other variants of autoencoder neural networks, our framework is able to isolate phonetic components from a speech utterance by s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…where α and β are weighting factors. In previous studies in [29,30,31], only the CE and MSE were jointly optimized in AM training. In our experiments, we set α to 5 as in most previous studies and heuristically set β to 0.2.…”
Section: Final Objective Functionmentioning
confidence: 99%
“…where α and β are weighting factors. In previous studies in [29,30,31], only the CE and MSE were jointly optimized in AM training. In our experiments, we set α to 5 as in most previous studies and heuristically set β to 0.2.…”
Section: Final Objective Functionmentioning
confidence: 99%
“…To import the advantage of unsupervised learning in [15], [16] into our acoustic modeling, the DNN model for generating the emission probabilities of the output labels can be regarded as the encoder, and the decoder is used to reconstruct the acoustic features, as shown in Figure 2. The output of the penultimate layer of the encoder is supposed to be a pure phoneme-related vector without any information irrelevant to the phonetic content.…”
Section: Proposed Model a Filtering Mechanism For Acoustic Modelingmentioning
confidence: 99%
“…In [15], [16], discriminative autoencoder-based (DcAE) acoustic modeling was proposed to separate the acoustic feature into the components of phoneme, speaker and environmental noise. Such model-space innovation takes a great advantage of unsupervised learning to extract pure phonetic components from the acoustic feature to better recognize speech.…”
Section: Introductionmentioning
confidence: 99%
“…The other, built up of interleaving TDNN layers and long short-term memory (LSTM) layers [23], is the TDNN-LSTM structure [24], which has a wide temporal context and performs as well as bidirectional LSTM networks with less latency [25]. Following the work of discriminative autoencoders (DcAEs) for acoustic modeling in [26], where the encoder layers were only implemented by deep FFNs with temporarily augmented fMLLR features, we separately use TDNNs and TDNN-LSTM networks as encoders to see if the advantages of DcAEs can be sustained and developed. As described in [27], an autoencoder can help to preserve the most salient information.…”
Section: Introductionmentioning
confidence: 99%