Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-608
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Abstract: In this paper, we present Cross Language Agent -Simultaneous Interpretation, CLASI, a high-quality and human-like Simultaneous Speech Translation (SiST) 1 System. Inspired by professional human interpreters, we utilize a novel data-driven read-write strategy to balance the translation quality and latency. To address the challenge of translating in-domain terminologies, CLASI employs a multi-modal retrieving module to obtain relevant information to augment the translation. Supported by LLMs, our approach can ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 49 publications
0
6
0
Order By: Relevance
“…Few studies (Elbayad et al, 2020a;Nguyen et al, 2021b) tried to improve the encoder part of simultaneous systems. Elbayad et al (2020a) and Nguyen et al (2021b) introduced the use of unidirectional encoders instead of standard bidirectional encoders (i.e. the encoder states are not updated after each read action) to speed up the decoding phase.…”
Section: Architectural Challengesmentioning
confidence: 99%
“…Few studies (Elbayad et al, 2020a;Nguyen et al, 2021b) tried to improve the encoder part of simultaneous systems. Elbayad et al (2020a) and Nguyen et al (2021b) introduced the use of unidirectional encoders instead of standard bidirectional encoders (i.e. the encoder states are not updated after each read action) to speed up the decoding phase.…”
Section: Architectural Challengesmentioning
confidence: 99%
“…Strategy. Few studies (Elbayad et al, 2020a;Nguyen et al, 2021b) tried to improve the encoder part of simultaneous systems.…”
Section: Encodingmentioning
confidence: 99%
“…Elbayad et al (2020a) and Nguyen et al (2021b) introduced the use of unidirectional encoders instead of standard bidirectional encoders (i.e. the encoder states are not updated after each read action) to speed up the decoding phase.…”
Section: Encodingmentioning
confidence: 99%
See 1 more Smart Citation
“…SMAD aims to identify the temporal locations of speech, music, and their corresponding activity levels within a polyphonic mixture of audio signals. A reliable SMAD system can be used to extract relevant parts of audio signals in preparation for other speech or music focused tasks such as spoken language identification [1,2], speech recognition [3] and detection [4], speaker diarization, and singer identification [5]. For radio broadcasters and television services, by providing timing metadata about music and speech portion of the broadcasted content, SMAD can also help with a variety of tasks, such as data procurement for royalty payments and dialog loudness measurement.…”
Section: Introductionmentioning
confidence: 99%