Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) 2022
DOI: 10.18653/v1/2022.iwslt-1.24
|View full text |Cite
|
Sign up to set email alerts
|

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

Abstract: In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3× faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 60 publications
0
9
0
Order By: Relevance
“…Specifically, [6] proposed a hold-n policy that removes the last n tokens from the model output, and local agreement that finds the longest common prefix of outputs obtained for two consecutive input contexts. Moreover, [7] showed that varying the chunk size can also be effectively applied along these policies.…”
Section: Latency-quality Trade-offmentioning
confidence: 99%
See 3 more Smart Citations
“…Specifically, [6] proposed a hold-n policy that removes the last n tokens from the model output, and local agreement that finds the longest common prefix of outputs obtained for two consecutive input contexts. Moreover, [7] showed that varying the chunk size can also be effectively applied along these policies.…”
Section: Latency-quality Trade-offmentioning
confidence: 99%
“…The advantage of the standard beam search is that the model can generate a complete translation for current speech input. However, it is also prone to overgeneration and low-quality translations toward the end of the context [7]. In ASR, the standard beam search with attentional models shown poor length generalization [27].…”
Section: Latency-quality Trade-offmentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, we proposed an improved version of the AL metric, which was later independently proposed under name length-adaptive average lagging (LAAL; Papi et al, 2022). To remedy the over-generation problem, we proposed an improved version of the beam search algorithm in Polák et al (2023b). While this led to significant improvements in the quality-latency tradeoff, the decoding still relied on label-synchronous decoding.…”
Section: Quality-latency Tradeoff In Sstmentioning
confidence: 99%