2019
DOI: 10.48550/arxiv.1910.00254
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multilingual End-to-End Speech Translation

Abstract: In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 45 publications
0
7
0
Order By: Relevance
“…Base setting LSTM ASR + MT (Bérard et al, 2018) 14.60 LSTM ASR + MT (Inaguma et al, 2019) 15.80 Transformer ASR + MT (Liu et al, 2019a) 17 and decoder. We also achieve better results than a knowledge distillation baseline in which an MT model is introduced to teach the ST model (Liu et al, 2019a).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Base setting LSTM ASR + MT (Bérard et al, 2018) 14.60 LSTM ASR + MT (Inaguma et al, 2019) 15.80 Transformer ASR + MT (Liu et al, 2019a) 17 and decoder. We also achieve better results than a knowledge distillation baseline in which an MT model is introduced to teach the ST model (Liu et al, 2019a).…”
Section: Resultsmentioning
confidence: 99%
“…In the context of expanded setting, Bahar et al (2019b) apply the SpecAugment (Park et al, 2019) on Librispeech English-French ST task, where his team uses a total of 236h of speech for ASR pre-training. Inaguma et al (2019) combine three ST datasets of 472h training data to train a multilingual ST model for both Librispeech English-French ST task and IWSLT2013 English-German ST task. And (Wang et al, 2019) introduce an additional 272h ASR corpus and 41M parallel data from WMT18 to enhance the ST performance.…”
Section: Expanded Setting With External Datamentioning
confidence: 99%
“…We used Fisher Spanish corpus (Graff et al, 2010) to perform Spanish speech to English text translation. And we followed previous works (Inaguma et al, 2019) for pre-processing steps, and 40/160 hours of train set, standard dev-test are used for the experiments. Byte-pair-encoding (BPE) (Kudo and Richardson, 2018) was applied to the target transcriptions to form 10K subwords as the target of the translation part.…”
Section: Methodsmentioning
confidence: 99%
“…Previous works show that deep end-to-end models can outperform conventional pipeline systems with sufficient training data (Weiss et al, 2017;Inaguma et al, 2019;. Nevertheless, well-annotated bilingual data is expensive and hard to collect (Bansal et al, 2018a,b;Duong et al, 2016).…”
Section: Introductionmentioning
confidence: 99%
“…Recently, there has been much interest in end-to-end speech translation (ST) models [1,2,3,4,5,6,7], which, compared to traditional cascaded models, are simpler and computationally more efficient, can preserve more acoustic information such as prosody and can avoid propagating errors from the speech recognition component. Large amounts of annotated data are usually required for achieving a good performance for such systems, but supervised training data for ST remain very limited.…”
Section: Introductionmentioning
confidence: 99%