ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

Boito, Marcely Zanon; Ortega, John; Riguidel, Hugo; Laurent, Antoine; Barrault, Loïc; Bougares, Fethi; Firas, Chaabani,; Nguyen, Ha H.; Barbier, Florentin; Gahbiche, Souhir; Estève, Yannick

doi:10.18653/v1/2022.iwslt-1.28

Cited by 5 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results from the submissions showed that using powerful speech feature extractors such as Wav2Vec 2.0 and massive multilingual decoders such as mBART-50 does not stop low-resource ST from being a major challenge. Of the three submissions, training self-supervised models on the target data and producing artificial supervision seemed to be the most effective approach to solving the problem (Zanon Boito et al, 2022). Previous, well-performing systems submitted to the IWLST offline and low-resource speech translation tracks made use of various methods to improve the performance of their cascade system.…”

Section: Previous Iwslt Approaches Formentioning

confidence: 99%

“…Previous, well-performing systems submitted to the IWLST offline and low-resource speech translation tracks made use of various methods to improve the performance of their cascade system. For the ASR component, many submissions used a combination of transformer and conformer models (Zhang et al, 2022;Li et al, 2022;Nguyen et al, 2021) or fine-tuned existing models (Zhang and Ao, 2022;Zanon Boito et al, 2022;Denisov et al, 2021). They managed to increase ASR performance by voice activity detection for segmentation (Zhang et al, 2022;Ding and Tao, 2021), training the ASR on synthetic data with added punctuation, noise-filtering and domain-specific finetuning (Zhang and Ao, 2022;Li et al, 2022) or adding an intermediate model that cleans the ASR output in terms of casing and punctuation (Nguyen et al, 2021).…”

Section: Previous Iwslt Approaches Formentioning

confidence: 99%

See 1 more Smart Citation

UM-DFKI Maltese Speech Translation

Williams,

Abela,

Kumar

et al. 2023

Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

View full text Add to dashboard Cite

For the 2023 IWSLT (Agarwal et al., 2023) Maltese Speech Translation Task, UM-DFKI jointly presents a cascade solution which achieves 0.6 BLEU. While this is the first time that a Maltese speech translation task has been released by IWSLT, this paper explores previous solutions for other speech translation tasks, focusing primarily on low-resource scenarios. Moreover, we present our method of fine-tuning XLS-R models for Maltese ASR using a collection of multi-lingual speech corpora as well as the fine-tuning of the mBART model for Maltese to English machine translation.

show abstract

Section: Previous Iwslt Approaches Formentioning

confidence: 99%

Section: Previous Iwslt Approaches Formentioning

confidence: 99%

UM-DFKI Maltese Speech Translation

Williams,

Abela,

Kumar

et al. 2023

Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

View full text Add to dashboard Cite

show abstract

“…Conversational Speech Translation Work on conversational ST (Kumar et al, 2014b,a;Zanon Boito et al, 2022) has mainly focused on single-speaker speech, either segmented manually or automatically, via voice activity detection. Manual segmentation was assumed in recent studies, based on the Fisher and CALLHOME corpora, on cascaded ST (Kumar et al, 2014b), E2E-ST (Weiss et al, 2017;Peng et al, 2023), simultaneous ASR & ST (Soky et al, 2022), streamed ST (Deng et al, 2022), and multilingual ST (Inaguma et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Zuluaga-Gomez,

Huang,

Niu

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single-and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training. 1 * Work conducted during an internship at Amazon.

show abstract

“…• ON-TRAC (Laurent et al, 2023) participated in two language-pairs in the Low-Resource task as well as this task. For this task, they focused on using SAMU-XLS-R as the multilingual, multimodal pretrained speech encoder and mBART as the text decoder.…”

Section: Submissionsmentioning

confidence: 99%

“…• ON-TRAC (Laurent et al, 2023) participated in the Pashto-French (one primary and three contrastive systems, both for constrained and unconstrained settings) and Tamasheq-French (one primary and five contrastive systems, all of which are unconstrained (c.f. Table 44).…”

Section: Submissionsmentioning

confidence: 99%

Findings of the Iwslt 2023 Evaluation Campaign

Agarwal¹,

Agrawal²,

Anastasopoulos³

et al. 2023

Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

View full text Add to dashboard Cite

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, multilingual, dialect and low-resource speech translation, and formality control. The shared tasks attracted a total of 38 submissions by 31 teams. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

show abstract

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

Cited by 5 publications

References 22 publications

UM-DFKI Maltese Speech Translation

UM-DFKI Maltese Speech Translation

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Findings of the Iwslt 2023 Evaluation Campaign

Contact Info

Product

Resources

About