Lost in Interpreting: Speech Translation from Source or Interpreter?

Macháček, Dominik; Žilinec, Matúš; Bojar, Ondřej

doi:10.21437/interspeech.2021-2232

Cited by 5 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We test our systems both on IWSLT test data (derived from TED talks) and on the ESIC test set 4 (Macháek et al, 2021). From IWSLT, we use tst2018 for De↔En, and tst2015/tst2016 combined for Cs↔En.…”

Section: Datamentioning

confidence: 99%

Self-training Reduces Flicker in Retranslation-based Simultaneous Translation

Sen,

Sennrich,

Zhang

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

In simultaneous translation, the retranslation approach has the advantage of requiring no modifications to the inference engine. However, in order to reduce the undesirable flicker in the output, previous work has resorted to increasing the latency through masking, and introducing specialised inference, thus losing the simplicity of the approach. In this work, we show that self-training improves the flickerlatency tradeoff, while maintaining similar translation quality to the original. Our analysis indicates that self-training reduces flicker by controlling monotonicity. Furthermore, selftraining can be combined with biased beam search to further improve the flicker-latency tradeoff.

show abstract

Section: Datamentioning

confidence: 99%

Self-training Reduces Flicker in Retranslation-based Simultaneous Translation

Sen,

Sennrich,

Zhang

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Evaluation Data For latency and quality analysis, we utilize the dev set of the manually transcribed ESIC corpus (Macháček et al, 2021) for English, German, and Czech ASR containing 179 documents. This corpus contains 5 hours of original English speeches from the European Parliament, including simultaneous interpreting into German and Czech.…”

Section: Benchmarking Settingsmentioning

confidence: 99%

“…We call our implementation Whisper-Streaming, although it is applicable to any model with API similar to Whisper. According to our evaluation, it achieves 3.3 seconds latency on average for English ASR on the European Parliament speech test set ESIC (Macháček et al, 2021), when running on NVIDIA A40 GPU, a fast hardware processing unit. We test it also on German and Czech ASR and present the results and suggestions for the optimal parameters.…”

Section: Introductionmentioning

confidence: 99%

Turning Whisper into Real-Time Transcription System

Macháček,

Dabre,

Bojar

2023

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi

View full text Add to dashboard Cite

Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription. In this paper, we build on top of Whisper and create Whisper-Streaming, an implementation of real-time speech transcription and translation of Whisper-like models. Whisper-Streaming uses local agreement policy with self-adaptive latency to enable streaming transcription. We show that Whisper-Streaming achieves high quality and 3.3 seconds latency on unsegmented long-form speech transcription test set, and we demonstrate its robustness and practical usability as a component in live transcription service at a multilingual conference.

show abstract

Videoconference interpreting goes multimodal

Zhang,

Corpas Pastor,

Zhang

2023

IVITRA Research in Linguistics and Literature

View full text Add to dashboard Cite

Recent times have witnessed an unprecedent surge of distant modalities of interpreting (remote, videoconference, etc.). The tendency has been particularly noticeable since the onset of the COVID-19 pandemic. Most scholarly research has explored the implications and applications of video technology for interpreting, its potential advantages and shortcomings. By contrast, this paper analyses the multimodal nature of videoconference interpreting (VCI) and its opportunities for research. Inspired by human bimodal perception and multi-sensory integration, our proposal adheres to the subfield of meeting content analysis as a convenient way to help interpreters prepare for a given meeting and provide a better user experience. Our main aim is to come up with a core list of key features and resources that may be used to inform the development of VCI technology and multilingual conference support applications in the future.

show abstract

Lost in Interpreting: Speech Translation from Source or Interpreter?

Cited by 5 publications

References 18 publications

Self-training Reduces Flicker in Retranslation-based Simultaneous Translation

Self-training Reduces Flicker in Retranslation-based Simultaneous Translation

Turning Whisper into Real-Time Transcription System

Videoconference interpreting goes multimodal

Contact Info

Product

Resources

About