ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054116
|View full text |Cite
|
Sign up to set email alerts
|

Independent Language Modeling Architecture for End-To-End ASR

Abstract: The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the language model (LM), is conditioned on the encoder output. This means that the acoustic encoder and the language model are entangled that doesn't allow language model to be trained separately from external text data. To address th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 19 publications
0
11
0
Order By: Relevance
“…A LSTM-based encoder-decoder architecture [1], denoted as A1 in the rest of this paper, consists of a Bidirectional LSTM encoder and a LSTM-based decoder which are shown in Fig. 1.…”
Section: Baseline Architectures 21 Lstm-based Encoder-decoder Archite...mentioning
confidence: 99%
See 4 more Smart Citations
“…A LSTM-based encoder-decoder architecture [1], denoted as A1 in the rest of this paper, consists of a Bidirectional LSTM encoder and a LSTM-based decoder which are shown in Fig. 1.…”
Section: Baseline Architectures 21 Lstm-based Encoder-decoder Archite...mentioning
confidence: 99%
“…Such techniques not only require external language models but also lead to a slow inference. To tackle this problem, [1] has proposed long short term memory (LSTM)-based encoderdecoder architecture which allows improving the LM capacity of the decoder using the extra text data. However, it utilized the LSTM structure for the encoder which has shown limited modeling capacity as well as slow training.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations