Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2023
DOI: 10.18653/v1/2023.acl-long.602
|View full text |Cite
|
Sign up to set email alerts
|

Simple and Effective Unsupervised Speech Translation

Changhan Wang,
Hirofumi Inaguma,
Peng-Jen Chen
et al.

Abstract: The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue, we study a simple and effective approach to build speech translation systems without labeled data by leveraging recent advances in unsupervised speech recognition, machine translation and speech synthesis, either in a pipeline approach, or to generate pseudo-labels for trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 34 publications
0
1
0
Order By: Relevance
“…Several studies attempted to construct unsupervised S2ST (US2ST) systems. To date, Wang et al [7] proposed to develop US2ST by cascading unsupervised ASR (UASR) [8], unsupervised machine translation (UMT) [9,10], and unsupervised TTS (UTTS) [11,12]. UASR was trained to output pseudo labels given only speech data, and UMT was trained to map source-target monolingual corpus into shared latent representation via adversarial learning.…”
Section: Introductionmentioning
confidence: 99%
“…Several studies attempted to construct unsupervised S2ST (US2ST) systems. To date, Wang et al [7] proposed to develop US2ST by cascading unsupervised ASR (UASR) [8], unsupervised machine translation (UMT) [9,10], and unsupervised TTS (UTTS) [11,12]. UASR was trained to output pseudo labels given only speech data, and UMT was trained to map source-target monolingual corpus into shared latent representation via adversarial learning.…”
Section: Introductionmentioning
confidence: 99%