ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682801
|View full text |Cite
|
Sign up to set email alerts
|

Towards End-to-end Speech-to-text Translation with Two-pass Decoding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 30 publications
(17 citation statements)
references
References 8 publications
0
17
0
Order By: Relevance
“…E2E ST opens the way to bridging the modality gap directly, but it is data-hungry, sample-inefficient and often underperforms cascade models especially in low-resource settings (Bansal et al, 2018). This led researchers to explore solutions ranging from efficient neural architecture design (Karita et al, 2019;Sung et al, 2019) to extra training signal incorporation, including multi-task learning (Weiss et al, 2017;Liu et al, 2019b), submodule pretraining (Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020), knowledge distillation (Liu et al, 2019a), meta-learning (Indurthi et al, 2019) and data augmentation Jia et al, 2019;Pino et al, 2019). Our work focuses on E2E ST, but we investigate feature selection which has rarely been studied before.…”
Section: Related Workmentioning
confidence: 99%
“…E2E ST opens the way to bridging the modality gap directly, but it is data-hungry, sample-inefficient and often underperforms cascade models especially in low-resource settings (Bansal et al, 2018). This led researchers to explore solutions ranging from efficient neural architecture design (Karita et al, 2019;Sung et al, 2019) to extra training signal incorporation, including multi-task learning (Weiss et al, 2017;Liu et al, 2019b), submodule pretraining (Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020), knowledge distillation (Liu et al, 2019a), meta-learning (Indurthi et al, 2019) and data augmentation Jia et al, 2019;Pino et al, 2019). Our work focuses on E2E ST, but we investigate feature selection which has rarely been studied before.…”
Section: Related Workmentioning
confidence: 99%
“…To tackle these problems, several approaches for integrated end-to-end training of cascaded models have been proposed and applied to different NLP tasks Sung et al, 2019). Integrated end-to-end training is usually achieved by merging the consecutive models and fine-tuning the resulting system on the endto-end training data.…”
Section: Discussionmentioning
confidence: 99%
“…Motivated by the recent research on using the second decoder to do post-editing [19,20,21,22,23], we use the similar structure to achieve the goal of proofreading. As shown in Figure 2, we use the basic setting of the transformer decoder [18], and add an additional stacked multi-head attention layer after the original multi-head attention layer to deal with the phone embedding of the source speech.…”
Section: Decoder Fusionmentioning
confidence: 99%