Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1326
|View full text |Cite
|
Sign up to set email alerts
|

Low-Resource Speech-to-Text Translation

Abstract: Speech-to-text translation has many potential applications for low-resource languages, but the typical approach of cascading speech recognition with machine translation is often impossible, since the transcripts needed to train a speech recognizer are usually not available for low-resource languages. Recent work has found that neural encoder-decoder models can learn to directly translate foreign speech in high-resource scenarios, without the need for intermediate transcription. We investigate whether this appr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
60
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 56 publications
(61 citation statements)
references
References 27 publications
1
60
0
Order By: Relevance
“…Pre-training can be done in different ways as proposed in the literature. The common way is to use an ASR encoder and an MT decoder to initialize the parameters of the ST network correspondingly [20]. Surprisingly, using an ASR model to pre-train both the encoder and the decoder of the ST model works well [19].…”
Section: Pre-trainingmentioning
confidence: 99%
See 2 more Smart Citations
“…Pre-training can be done in different ways as proposed in the literature. The common way is to use an ASR encoder and an MT decoder to initialize the parameters of the ST network correspondingly [20]. Surprisingly, using an ASR model to pre-train both the encoder and the decoder of the ST model works well [19].…”
Section: Pre-trainingmentioning
confidence: 99%
“…The end-to-end model has advantages over the cascaded pipeline, however, its training requires a moderate amount of paired speech-to-text data which is not easy to acquire. Therefore, recently some techniques such as multitask learning [13,[15][16][17], pre-training different components of the model [18][19][20] and generating synthetic data [21] have been proposed to mitigate the lack of ST parallel training data. These methods aim to use weakly supervised data, i.e.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Examples of source transcripts and original translations with the fluent counterparts are shown below in Table 1. SRC eh, eh, eh, um, yo pienso que es así ORG uh, uh, uh, um, i think it's like that FLT i think it's like that SRC también tengo um eh estoy tomando una clase .. ORG i also have um eh i'm taking a marketing class .. FLT i'm also taking a marketing class SRC porque qué va, mja ya te acuerda que .. ORG because what is, mhm do you recall now that .. FLT do you recall now that .. SRC y entonces am es entonces la universidad donde yo estoy es university of pennsylvania ORG and so am and so the university where i am it's the university of pennsylvania FLT i am at the university of pennsylvania 3 Speech-to-Text Model Initial work on the Fisher-Spanish dataset used HMM-GMM ASR models linked with phrasebased MT using lattices (Post et al, 2013;Kumar et al, 2014). More recently, it was shown in Weiss et al (2017) and Bansal et al (2018) that end-toend SLT models perform competitively on this task. As in Bansal et al (2018), we use a sequence-tosequence architecture inspired by Weiss et al but modified to train within available resources; specifically, all models may be trained in less than 5 days on one GPU.…”
Section: Datamentioning
confidence: 99%
“…Low-resource automatic speech-to-text translation (AST) has recently gained traction as a way to bring NLP tools to under-represented languages. An end-to-end approach [1][2][3][4][5][6][7] is particularly appealing for source languages with no written form, or for endangered languages where translations into a high-resource language may be easier to collect than transcriptions [8]. However, building high-quality endto-end AST with little parallel data is challenging, and has led researchers to explore how other sources of data could be used to help.…”
Section: Introductionmentioning
confidence: 99%