2022
DOI: 10.48550/arxiv.2203.10426
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Abstract: How to learn a better speech representation for end-to-end speech-to-text translation (ST) with limited labeled data? Existing techniques often attempt to transfer powerful machine translation (MT) capabilities to ST, but neglect the representation discrepancy across modalities. In this paper, we propose the Speech-TExt Manifold Mixup (STEMM) method to calibrate such discrepancy. Specifically, we mix up the representation sequences of different modalities, and take both unimodal speech sequences and multimodal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 39 publications
0
5
0
Order By: Relevance
“…Speech recognition [42,6,21] is widely used in online meetings and social applications to recognize speech content. Speech translation [55,62,18] is commonly used in simultaneous interpretation applications for cross-lingual communication in crossborder travel and meetings. Keyword spotting [5,49,28] is employed in short video applications to quickly retrieve relevant content.…”
Section: Related Work 21 Audio-visual Speechmentioning
confidence: 99%
See 4 more Smart Citations
“…Speech recognition [42,6,21] is widely used in online meetings and social applications to recognize speech content. Speech translation [55,62,18] is commonly used in simultaneous interpretation applications for cross-lingual communication in crossborder travel and meetings. Keyword spotting [5,49,28] is employed in short video applications to quickly retrieve relevant content.…”
Section: Related Work 21 Audio-visual Speechmentioning
confidence: 99%
“…Many works [64,19,60,18,23] bridge the gap between modalities with mixup. [63] proposes mixup for data augmentation to improve model robustness.…”
Section: Mixup For Cross-modality Transfermentioning
confidence: 99%
See 3 more Smart Citations