Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-526
|View full text |Cite
|
Sign up to set email alerts
|

AlloST: Low-Resource Speech Translation Without Source Transcription

Abstract: The end-to-end architecture has made promising progress in speech translation (ST). However, the ST task is still challenging under low-resource conditions. Most ST models have shown unsatisfactory results, especially in the absence of word information from the source speech utterance. In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer. The framework is based on an attent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…One of possible matters to be discussed might be if it is even possible to acquire well-founded information of speech in language that is not written. The method of byte pair encoding, which compresses a phone sequence into a syllable-like segmented sequence (Cheng, Lee, Wang, 2021, p. 2252) might be one of possible solutions to the given issue due to the fact that it is similar to a method used when a source transcription is available. Moreover, when properly explored, speech translation for low-resource languages might oer various merits in e.g.…”
Section: From Machine To Speech Translationmentioning
confidence: 99%
“…One of possible matters to be discussed might be if it is even possible to acquire well-founded information of speech in language that is not written. The method of byte pair encoding, which compresses a phone sequence into a syllable-like segmented sequence (Cheng, Lee, Wang, 2021, p. 2252) might be one of possible solutions to the given issue due to the fact that it is similar to a method used when a source transcription is available. Moreover, when properly explored, speech translation for low-resource languages might oer various merits in e.g.…”
Section: From Machine To Speech Translationmentioning
confidence: 99%
“…It is shown that recent end-to-end speech-to-text translation (S2TT) models perform comparably to the cascaded counterparts on the well-established MuST-C benchmark (Bentivogli et al, 2021). Given the scarcity of speech translation corpora, there are recent attempts on building end-to-end S2TT models under low-resource settings (Bansal et al, 2018(Bansal et al, , 2019Cheng et al, 2021) or unsupervised settings (Chung et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…In this work, we extract the parallel speech of approximately 200 hours of raw audio from ECCC and its corresponding documents in Khmer, English, and French. This corpus will be not only usable for a pure ASR, MT, and ST, but also for a wide range of advanced tasks including multilingual ASR, MT, ST, cross-lingual, multi-source translation 10,11,12,13,14,15,16 , or joint training 17,18 . a Sentence alignment of the source and target language is crucial in SLT corpus creation.…”
Section: Introductionmentioning
confidence: 99%