2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8268971
|View full text |Cite
|
Sign up to set email alerts
|

Exploring the use of acoustic embeddings in neural machine translation

Abstract: Neural Machine Translation (NMT) has recently demonstrated improved performance over statistical machine translation and relies on an encoder-decoder framework for translating text from source to target. The structure of NMT makes it amenable to add auxiliary features, which can provide complementary information to that present in the source text. In this paper, auxiliary features derived from accompanying audio, are investigated for NMT and are compared and combined with text-derived features. These acoustic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…Often, the prior or future context from video, audio, or other subtitle instances is necessary to fill these contextual gaps. Sentence-level APE cannot address these issues robustly, which calls for further research on multimodal (Deena et al, 2017;Caglayan et al, 2019) anddocument-level (Hardmeier et al, 2015;Voita et al, 2019) translation and post-editing, especially for subtitles.…”
Section: Qualitative Analysismentioning
confidence: 99%
“…Often, the prior or future context from video, audio, or other subtitle instances is necessary to fill these contextual gaps. Sentence-level APE cannot address these issues robustly, which calls for further research on multimodal (Deena et al, 2017;Caglayan et al, 2019) anddocument-level (Hardmeier et al, 2015;Voita et al, 2019) translation and post-editing, especially for subtitles.…”
Section: Qualitative Analysismentioning
confidence: 99%
“…We build our systems on three speech translation corpora: Fisher-CallHome Spanish, Librispeech, and Speech-Translation TED (ST-TED) corpus. To the best of our knowledge, these are the only public available corpora recorded with a reasonable size of real speech data 6 . The data statistics are summarized in Table 1.…”
Section: Datamentioning
confidence: 99%
“…Recently, end-to-end speech translation (E2E-ST) with a sequence-to-sequence model has attracted attention for its extremely simplified architecture without complicated pipeline systems [3,4,5]. By directly translating speech signals in a source language to text in a target language, the model is able to avoid error propagation from the ASR module, and also leverages acoustic clues in the source language, which have shown to be useful for translation [6]. Moreover, it is more memory-and computationally efficient since complicated decoding for the ASR module and the latency occurring between ASR and MT modules can be bypassed.…”
Section: Introductionmentioning
confidence: 99%
“…One key attribute of embedding methods is that word embedding models take into account context information of words, thereby allowing a more compact and manageable representation for words [3,4]. The embeddings are widely applied in many downstream NLP tasks such as neural machine translation, dialogue system or text summarisation [5,6,7], as well as in language modelling for speech recognition [8].…”
Section: Introductionmentioning
confidence: 99%