2021
DOI: 10.48550/arxiv.2104.00120
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Abstract: Stream fusion, also known as system combination, is a common technique in automatic speech recognition for traditional hybrid hidden Markov model approaches, yet mostly unexplored for modern deep neural network end-to-end model architectures. Here, we investigate various fusion techniques for the all-attention-based encoder-decoder architecture known as the transformer, striving to achieve optimal fusion by investigating different fusion levels in an example single-microphone setting with fusion of standard ma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 37 publications
0
4
0
Order By: Relevance
“…Fusion Fusion is another method for utilizing multiple candidates (Liu et al, 2021;Lohrenz et al, 2021). Specifically, Fusion uses a shared encoder to extract the representation of each candidate and The comparison results of FastCorrect 2 with the above two baselines are shown in Table 5.…”
Section: Comparison With Other Methodsmentioning
confidence: 99%
“…Fusion Fusion is another method for utilizing multiple candidates (Liu et al, 2021;Lohrenz et al, 2021). Specifically, Fusion uses a shared encoder to extract the representation of each candidate and The comparison results of FastCorrect 2 with the above two baselines are shown in Table 5.…”
Section: Comparison With Other Methodsmentioning
confidence: 99%
“…For the multi-encoder transformer, researchers have focused on the different types of input values (sources) to be received and encoded and how to combine representations generated from the multi-encoder and use it in the decoder. Multi-encoder approaches have mainly been studied in relation to speech recognition [20], [21], [22], neural machine translation and automatic post-editing (APE), the task of correcting errors in machine-translated texts [10], [23], [24], [25], [26], [27]. In the existing multi-encoder method, two encoders are mainly used.…”
Section: Multi-encoder Transformermentioning
confidence: 99%
“…Lohrenz et al. [ 24 ] proposed a multi-encoder learning and stream fusion for Transformer-based end-to-end automatic speech recognition. It is valuable to develop the new continuous HMR scheme based on Transformer-based encoder–decoder architecture.…”
Section: Related Workmentioning
confidence: 99%