2022
DOI: 10.48550/arxiv.2203.15479
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Abstract: Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD 1 have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…To this end, researchers tried considering not only the presence of speech but also its length (Potapczyk and Przybysz, 2020;Inaguma et al, 2021;. Later studies tried to avoid VAD and focused on more linguisticallymotivated approaches, e.g., ASR CTC to predict voiced regions Gállego et al (2021) or directly modeling the sentence segmentation (Tsiamas et al, 2022b;Fukuda et al, 2022).…”
Section: Long-form Offline Stmentioning
confidence: 99%
See 1 more Smart Citation
“…To this end, researchers tried considering not only the presence of speech but also its length (Potapczyk and Przybysz, 2020;Inaguma et al, 2021;. Later studies tried to avoid VAD and focused on more linguisticallymotivated approaches, e.g., ASR CTC to predict voiced regions Gállego et al (2021) or directly modeling the sentence segmentation (Tsiamas et al, 2022b;Fukuda et al, 2022).…”
Section: Long-form Offline Stmentioning
confidence: 99%
“…Drawing inspiration from offline long-form ST, which primarily emphasizes segmentation, we consider direct segmentation modeling the most promising approach (Tsiamas et al, 2022a;Fukuda et al, 2022). The limitation of these approaches is that they do not allow out-of-the-box simultaneous inference.…”
Section: Towards the Long-form Sst Viamentioning
confidence: 99%