ESPnet-ST IWSLT 2021 Offline Speech Translation System

Inaguma, Hirofumi; Yan, Brian; Dalmia, Siddharth; Guo, Pengcheng; Shi, Jiatong; Duh, Kevin; Watanabe, Shinji

doi:10.18653/v1/2021.iwslt-1.10

Cited by 12 publications

(7 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is usually sub-optimal as speakers place pauses inside sentences, not necessarily between them (e.g., hesitations before words with high information content, Goldman-Eisler, 1958). To this end, researchers tried considering not only the presence of speech but also its length (Potapczyk and Przybysz, 2020;Inaguma et al, 2021;. Later studies tried to avoid VAD and focused on more linguisticallymotivated approaches, e.g., ASR CTC to predict voiced regions Gállego et al (2021) or directly modeling the sentence segmentation (Tsiamas et al, 2022b;Fukuda et al, 2022).…”

Section: Long-form Offline Stmentioning

confidence: 99%

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

Polák¹,

Pham²,

Nguyen³

et al. 2022

Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

View full text Add to dashboard Cite

In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3× faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available. 1

show abstract

Section: Long-form Offline Stmentioning

confidence: 99%

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

Polák¹,

Pham²,

Nguyen³

et al. 2022

Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

View full text Add to dashboard Cite

show abstract

“…Merging the short segments helps the ST model utilize the context information. So we follow the algorithm in (Inaguma et al, 2021) to merge the short segments after the segmentation.…”

Section: Speech Segmentationmentioning

confidence: 99%

“…The pyannote toolkit improve the performance significantly compared to the given segmentation. The merge algorithm from Inaguma et al (2021) further decreases the WER. We adjust two parameters of merge algorithm, M dur and M int .…”

Section: Cascade Speech Translationmentioning

confidence: 99%

“…This paper also tries to improve the system performance by exploring various techniques for the related tasks. (1) To boost the performance with advanced speech segmentation (Anastasopoulos et al, 2021), we apply the pyannote toolkit (Bredin et al, 2020) and the merge algorithm from Inaguma et al (2021) to segment the audio. Particularly, to overcome the long sentence problem in the dataset, we design a new segment algorithm.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task

Zhang¹

2022

Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

View full text Add to dashboard Cite

This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese. The YiTrans system is built on large-scale pre-trained encoder-decoder models. More specifically, we first design a multistage pre-training strategy to build a multimodality model with a large amount of labeled and unlabeled data. We then fine-tune the corresponding components of the model for the downstream speech translation tasks. Moreover, we make various efforts to improve performance, such as data filtering, data augmentation, speech segmentation, model ensemble, and so on. Experimental results show that our YiTrans system obtains a significant improvement than the strong baseline on three translation directions, and it achieves +5.2 BLEU improvements over last year's optimal end-to-end system on tst2021 English-German. Our final submissions rank first on English-German and English-Chinese end-toend systems in terms of the automatic evaluation metric. We make our code and models publicly available. 1 * Equal contributions during internship at Microsoft Research Asia. 1 https://github.com/microsoft/SpeechT5 3 http://data.statmt.org/news-commentary 4 http://www.statmt.org/europarl/v10 5 http://data.statmt.org/news-crawl 6 http://data.statmt.org/ngrams

show abstract

“…In recent studies, many speech segmentation methods based on VAD have been proposed for ST. Gaido et al [12] and Inaguma et al [13] used the heuristic concatenation of VAD segments up to a fixed length to address the over-segmentation problem. Gállego et al [14] used a pre-trained ASR model called wav2Vec 2.0 [15] for silence detection.…”

Section: Related Workmentioning

confidence: 99%

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Fukuda¹,

Sudoh²,

Nakamura³

2022

Preprint

View full text Add to dashboard Cite

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD 1 have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

show abstract

ESPnet-ST IWSLT 2021 Offline Speech Translation System

Cited by 12 publications

References 29 publications

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Contact Info

Product

Resources

About