M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

Zhao, Jichun; Hao, Yang; Shareghi, Ehsan; Haffari, Gholamreza

doi:10.21437/interspeech.2022-592

Cited by 5 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It processes the audio features obtained by applying 80-dimensional Mel filterbanks to the audio waveform. The W2V-BERT encoder is followed by a Length Adapter based on a modified version of the M-adaptor (Zhao et al, 2022), which is a Transformer-based model (Vaswani et al, 2017) that is in charge of compressing the speech representation (by a factor of 8) through attention pooling. The compressed input representations are then fed to the NLLB decoder, in its 1.3B parameters configuration, to produce the translations.…”

Section: Simulseamlessmentioning

confidence: 99%

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Papi¹,

Gaido²,

Negri³

et al. 2022

Proceedings of the Third Workshop on Automatic Simultaneous Translation

View full text Add to dashboard Cite

Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL). In this paper we highlight that, despite its widespread adoption, AL provides underestimated scores for systems that generate longer predictions compared to the corresponding references. We also show that this problem has practical relevance, as recent SimulST systems have indeed a tendency to over-generate. As a solution, we propose LAAL (Length-Adaptive Average Lagging), a modified version of the metric that takes into account the over-generation phenomenon and allows for unbiased evaluation of both under-/overgenerating systems.

show abstract

Section: Simulseamlessmentioning

confidence: 99%

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Papi¹,

Gaido²,

Negri³

et al. 2022

Proceedings of the Third Workshop on Automatic Simultaneous Translation

View full text Add to dashboard Cite

show abstract

“…2 Training batch size for a modern ST system (Gállego et al, 2021) could not exceed 1 on a V100 16GB GPU. representation length, and Zhao et al (2022) proposed a Transformer-based adaptor to shrink a sequence. Yet, the complexity of encoding remains high.…”

Section: Introductionmentioning

confidence: 99%

RedApt: An Adaptor for wav2vec 2 EncodingFaster and Smaller Speech Translation without Quality Compromise

Zhao¹,

Hao²,

Haffari³

et al. 2022

Findings of the Association for Computational Linguistics: EMNLP 2022

View full text Add to dashboard Cite

Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-theart (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained WAV2VEC 2 speech encoder with RedApt brings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.

show abstract

Learning Semantic Information from Machine Translation to Improve Speech-to-Text Translation

Deng,

Zhang,

Zhou

et al. 2023

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

View full text Add to dashboard Cite

M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

Cited by 5 publications

References 28 publications

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

RedApt: An Adaptor for wav2vec 2 EncodingFaster and Smaller Speech Translation without Quality Compromise

Learning Semantic Information from Machine Translation to Improve Speech-to-Text Translation

Contact Info

Product

Resources

About