ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

Elbayad, Maha; Nguyen, Ha Thanh; Bougares, Fethi; Tomashenko, Natalia; Caubrière, Antoine; Lecouteux, Benjamin; Estève, Yannick; Besacier, Laurent

doi:10.18653/v1/2020.iwslt-1.2

Cited by 14 publications

(20 citation statements)

References 24 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Teams followed the suggestion to submit multiple systems per regime, which resulted in a total of 56 systems overall. ON-TRAC (Elbayad et al, 2020) participated in both the speech and text tracks. The authors used a hybrid pipeline for simultaneous speech (Ma et al, 2019).…”

Section: Submissionsmentioning

confidence: 99%

“…ON-TRAC (Elbayad et al, 2020) participated with end-to-end systems, focusing on speech segmentation, data augmentation and the ensembling of multiple models. They experimented with several attention-based encoder-decoder models sharing the general backbone architecture described in , which comprises an encoder with two VGG-like (Simonyan and Zisserman, 2015) CNN blocks followed by five stacked BLSTM layers.…”

Section: Submissionsmentioning

confidence: 99%

See 1 more Smart Citation

Findings of the Iwslt 2020 Evaluation Campaign

Ansari¹,

Axelrod²,

Bach³

et al. 2020

Proceedings of the 17th International Conference on Spoken Language Translation

View full text Add to dashboard Cite

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation. A total of 30 teams participated in at least one of the tracks. This paper introduces each track's goal, data and evaluation metrics, and reports the results of the received submissions.

show abstract

Section: Submissionsmentioning

confidence: 99%

Section: Submissionsmentioning

confidence: 99%

Findings of the Iwslt 2020 Evaluation Campaign

Ansari¹,

Axelrod²,

Bach³

et al. 2020

Proceedings of the 17th International Conference on Spoken Language Translation

View full text Add to dashboard Cite

show abstract

“…EN→DE Task The performances of text-to-text EN→DE task is shown in Figure 4(a). We can see that the performance of proposed CAAT is always better than that of wait-k with SBS and the best results from ON-TRAC in 2020 (Elbayad et al, 2020), especially in low latency regime, and the performance of CAAT with model ensemble is nearly equivalent to offline result. Moreover, it can be further noticed from Figure 4(a) that the model ensemble can also improve the BLUE score more or less under different latency regimes, and the increase is quite obvious in low latency regime.…”

Section: Text-to-text Simultaneous Translationmentioning

confidence: 84%

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021

Liú¹,

Du²,

Li³

et al. 2021

Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

View full text Add to dashboard Cite

This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task. We proposed a novel simultaneous translation model, Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T to sequence-tosequence tasks without monotonic constraints, e.g., simultaneous translation. Experiments on speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks shows CAAT achieves better quality-latency trade-offs compared to wait-k, one of the previous state-ofthe-art approaches. Based on CAAT architecture and data augmentation, we build S2T and T2T simultaneous translation systems in this evaluation campaign. Compared to last year's optimal systems, our S2T simultaneous translation system improves by an average of 11.3 BLEU for all latency regimes, and our T2T simultaneous translation system improves by an average of 4.6 BLEU.

show abstract

“…Besides, SpecAugment [19] is used to train our EN-DE char model as well. Further details can be found in [10,5].…”

Section: Evaluation Of Simultaneous Speech Translationmentioning

confidence: 99%

An Empirical Study of End-To-End Simultaneous Speech Translation Decoding Strategies

Nguyen

Estève

Besacier

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.

show abstract

ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

Cited by 14 publications

References 24 publications

Findings of the Iwslt 2020 Evaluation Campaign

Findings of the Iwslt 2020 Evaluation Campaign

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021

An Empirical Study of End-To-End Simultaneous Speech Translation Decoding Strategies

Contact Info

Product

Resources

About