Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-368
|View full text |Cite
|
Sign up to set email alerts
|

RNN-LDA Clustering for Feature Based DNN Adaptation

Abstract: Model based deep neural network (DNN) adaptation approaches often require multi-pass decoding in test time. Input feature based DNN adaptation, for example, based on latent Dirichlet allocation (LDA) clustering, provide a more efficient alternative. In conventional LDA clustering, the transition and correlation between neighboring clusters is ignored. In order to address this issue, a recurrent neural network (RNN) based clustering scheme is proposed to learn both the standard LDA cluster labels and their natu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…It is widely acknowledged that speaker adaptive training (SAT) is effective in improving ASR performance, especially for large vocabulary tasks [18][19][20]. Approaches to SAT are divided into two categories: model-based approaches, e.g., maximum likelihood linear regression (MLLR) [21], and feature-based approaches, e.g., feature-space MLLR (fMLLR) [22], i-vectors [23], speaker codes [24], and other appending features [25]. All of these methods are based on the assumption that speech transcription and/or speaker identity information are available.…”
Section: Introductionmentioning
confidence: 99%
“…It is widely acknowledged that speaker adaptive training (SAT) is effective in improving ASR performance, especially for large vocabulary tasks [18][19][20]. Approaches to SAT are divided into two categories: model-based approaches, e.g., maximum likelihood linear regression (MLLR) [21], and feature-based approaches, e.g., feature-space MLLR (fMLLR) [22], i-vectors [23], speaker codes [24], and other appending features [25]. All of these methods are based on the assumption that speech transcription and/or speaker identity information are available.…”
Section: Introductionmentioning
confidence: 99%