Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) 2022
DOI: 10.18653/v1/2022.iwslt-1.28
|View full text |Cite
|
Sign up to set email alerts
|

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results show that in our settings pipeline approaches are still very competitive… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…The results from the submissions showed that using powerful speech feature extractors such as Wav2Vec 2.0 and massive multilingual decoders such as mBART-50 does not stop low-resource ST from being a major challenge. Of the three submissions, training self-supervised models on the target data and producing artificial supervision seemed to be the most effective approach to solving the problem (Zanon Boito et al, 2022). Previous, well-performing systems submitted to the IWLST offline and low-resource speech translation tracks made use of various methods to improve the performance of their cascade system.…”
Section: Previous Iwslt Approaches Formentioning
confidence: 99%
See 1 more Smart Citation
“…The results from the submissions showed that using powerful speech feature extractors such as Wav2Vec 2.0 and massive multilingual decoders such as mBART-50 does not stop low-resource ST from being a major challenge. Of the three submissions, training self-supervised models on the target data and producing artificial supervision seemed to be the most effective approach to solving the problem (Zanon Boito et al, 2022). Previous, well-performing systems submitted to the IWLST offline and low-resource speech translation tracks made use of various methods to improve the performance of their cascade system.…”
Section: Previous Iwslt Approaches Formentioning
confidence: 99%
“…Previous, well-performing systems submitted to the IWLST offline and low-resource speech translation tracks made use of various methods to improve the performance of their cascade system. For the ASR component, many submissions used a combination of transformer and conformer models (Zhang et al, 2022;Li et al, 2022;Nguyen et al, 2021) or fine-tuned existing models (Zhang and Ao, 2022;Zanon Boito et al, 2022;Denisov et al, 2021). They managed to increase ASR performance by voice activity detection for segmentation (Zhang et al, 2022;Ding and Tao, 2021), training the ASR on synthetic data with added punctuation, noise-filtering and domain-specific finetuning (Zhang and Ao, 2022;Li et al, 2022) or adding an intermediate model that cleans the ASR output in terms of casing and punctuation (Nguyen et al, 2021).…”
Section: Previous Iwslt Approaches Formentioning
confidence: 99%
“…Conversational Speech Translation Work on conversational ST (Kumar et al, 2014b,a;Zanon Boito et al, 2022) has mainly focused on single-speaker speech, either segmented manually or automatically, via voice activity detection. Manual segmentation was assumed in recent studies, based on the Fisher and CALLHOME corpora, on cascaded ST (Kumar et al, 2014b), E2E-ST (Weiss et al, 2017;Peng et al, 2023), simultaneous ASR & ST (Soky et al, 2022), streamed ST (Deng et al, 2022), and multilingual ST (Inaguma et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…• ON-TRAC (Laurent et al, 2023) participated in two language-pairs in the Low-Resource task as well as this task. For this task, they focused on using SAMU-XLS-R as the multilingual, multimodal pretrained speech encoder and mBART as the text decoder.…”
Section: Submissionsmentioning
confidence: 99%
“…• ON-TRAC (Laurent et al, 2023) participated in the Pashto-French (one primary and three contrastive systems, both for constrained and unconstrained settings) and Tamasheq-French (one primary and five contrastive systems, all of which are unconstrained (c.f. Table 44).…”
Section: Submissionsmentioning
confidence: 99%