Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2016
DOI: 10.18653/v1/n16-1109
|View full text |Cite
|
Sign up to set email alerts
|

An Attentional Model for Speech Translation Without Transcription

Abstract: For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, we achieve large improvements relative to several baselines. On the more challenging speech-to-word alignment task, ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
122
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 128 publications
(122 citation statements)
references
References 15 publications
0
122
0
Order By: Relevance
“…Anastasopoulos et al [6] use k-means clustering to cluster repeated audio patterns and automatically align spoken words with their translations. Duong et al [7] focus on the alignment between speech and translated phrase but not to directly predict the final translations. Bérard et al [8] give the first proof of the potential for end-to-end speech-to-text translation without using source language.…”
Section: Related Workmentioning
confidence: 99%
“…Anastasopoulos et al [6] use k-means clustering to cluster repeated audio patterns and automatically align spoken words with their translations. Duong et al [7] focus on the alignment between speech and translated phrase but not to directly predict the final translations. Bérard et al [8] give the first proof of the potential for end-to-end speech-to-text translation without using source language.…”
Section: Related Workmentioning
confidence: 99%
“…∀t, i αi,t = 1, with i indexing the source symbols). However, there is no similar constraint for the source symbols, as discussed by [5]. Rather than enforcing additional constraints on the alignments, as in the latter reference, we propose to reverse the architecture and to translate from WRL words into UL symbols, following [9].…”
Section: Word Segmentations From Attentionmentioning
confidence: 99%
“…This speech dataset was collected following a real language documentation scenario, using Lig Aikuma, 4 a mobile app specifically dedicated to fieldwork language documentation, which works both on Android powered smartphones and tablets [8]. The corpus is multilingual (5,130 Mboshi speech utterances aligned to French text) and contains linguists' transcriptions in Mboshi in the form of a non-standard graphemic 3 The dataset is documented in [13] and available at https://github.com/besacier/ mboshi-french-parallel-corpus 4 http://lig-aikuma.imag.fr…”
Section: Corpus Baselines and Metricmentioning
confidence: 99%
See 1 more Smart Citation
“…Prior work noted that in severe low-resource situations it may actually be easier to collect speech paired with translations than transcriptions(Duong et al, 2016). However, we focus on well-resourced languages for which ASR and MT corpora exist and for which it is more realistic to obtain good speech translation accuracy.…”
mentioning
confidence: 99%