Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 2020
DOI: 10.18653/v1/2020.emnlp-demos.19
|View full text |Cite
|
Sign up to set email alerts
|

SIMULEVAL: An Evaluation Toolkit for Simultaneous Translation

Abstract: Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario where the model starts translating before reading the complete source input. Evaluating simultaneous translation models is more complex than offline models because the latency is another factor to consider in addition to translation quality. The research community, despite its growing focus on novel modeling approaches to simultaneous translation, currently lacks a universal evaluation procedure. Therefore, we pres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 44 publications
(44 citation statements)
references
References 10 publications
0
44
0
Order By: Relevance
“…The evaluation was run with the SIMULEVAL toolkit (Ma et al, 2020a). For the latency measurement of speech input systems, we contrasted computation-aware and non computation-aware latency metrics.…”
Section: Data and Metricsmentioning
confidence: 99%
“…The evaluation was run with the SIMULEVAL toolkit (Ma et al, 2020a). For the latency measurement of speech input systems, we contrasted computation-aware and non computation-aware latency metrics.…”
Section: Data and Metricsmentioning
confidence: 99%
“…In CS, ST is typically evaluated in terms of quality and latency. Similar to MT, the approach used consists in the application of automatic metrics in order to allow for a fast and objective evaluation of the systems (Ma et al, 2020). However, due to its novelty, the ST research community currently lacks a universally adopted evaluation methodology.…”
Section: Related Workmentioning
confidence: 99%
“…Gile, 2009), represents a more challenging task that still lacks sufficient clarity and consistency. In this context, several metrics have been introduced, such as Average Proportion (AP) (Cho and Esipova, 2016), Continues Wait Length (CW) (Gu et al, 2017), Average Lagging (AL) (Ma et al, 2020), Differentiable Average Lagging (DAL) (Cherry and Foster, 2019). Generally speaking, the evaluation approach used in CS is product-oriented.…”
Section: Related Workmentioning
confidence: 99%
“…For S2T task, input speech is simply segmented into utterances with duration of 20 seconds and each segmented piece is directly sent to our simultaneous translation systems to obtain the streaming results. We found an abnormally large average lagging (AL) on IWSLT tst2018 test set based on existed SimuEval toolkit (Ma et al, 2020a) and segment strategy, so relevant results are not presented here. A more reasonable latency criterion may be needed for unsegmented data in the future.…”
Section: Unsegmented Data Processingmentioning
confidence: 99%