ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683843
|View full text |Cite
|
Sign up to set email alerts
|

Large Context End-to-end Automatic Speech Recognition via Extension of Hierarchical Recurrent Encoder-decoder Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 28 publications
(19 citation statements)
references
References 17 publications
0
19
0
Order By: Relevance
“…Large-context encoder-decoder models: Large-context encoderdecoder models that can capture long-range linguistic contexts beyond sentence boundaries or utterance boundaries have received significant attention in E2E-ASR [7,8], machine translation [14,15], and some natural language generation tasks [16,17]. In recent studies, transformer-based large-context encoder-decoder models have been introduced in machine translation [18,19].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Large-context encoder-decoder models: Large-context encoderdecoder models that can capture long-range linguistic contexts beyond sentence boundaries or utterance boundaries have received significant attention in E2E-ASR [7,8], machine translation [14,15], and some natural language generation tasks [16,17]. In recent studies, transformer-based large-context encoder-decoder models have been introduced in machine translation [18,19].…”
Section: Related Workmentioning
confidence: 99%
“…We compared our proposed hierarchical transformer-based largecontext E2E-ASR model with an RNN-based utterance-level E2E-ASR model [3], transformer-based utterance-level E2E-ASR model [6], and hierarchical RNN-based large-context E2E-ASR model [8].…”
Section: Setupsmentioning
confidence: 99%
See 2 more Smart Citations
“…Alternatively, global and local topic vectors, and neural-based cache models were integrated into LMs [17][18][19]. More recently, an extra neural network component, such as a hierarchical RNN or a pretrained LM [20], was used to encode the cross-utterance information into a vector representation for LM adaptation [21][22][23]. On the other hand, improvements in cross-utterance TLMs were mainly from efficient extension of attention spans, such as using segment-wise recurrence between two adjacent segments [11], adopting adaptive attention spans, or applying specially-designed masks to cope with much longer input sequences [24,25].…”
Section: Introductionmentioning
confidence: 99%