Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system

Digalakis, Vassilios; Oikonomidis, Dimitris; Pratsolis, Dimitris; Tsourakis, Nikos; Vosnidis, Christos; Chatzichrisafis, Nikos; Diakoloukas, Vassilios

doi:10.21437/eurospeech.2003-458

Cited by 11 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Initially the raw recordings are segmented into 30 second segments and the transcriptions are split into smaller segments of approximately 1000 words called documents. Each segment is decoded using a seed acoustic model trained on the Logotypografia corpus [66] and a 4gram biased LM trained on the corresponding transcription of each recording. The best path transcript of each segment is obtained and paired with the best matching document via TF-IDF similarity.…”

Section: A Collection and Curation Of Hparlmentioning

confidence: 99%

See 1 more Smart Citation

Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: A case study for Modern Greek

Paraskevopoulos¹,

Kouzelis²,

Rouvalis³

et al. 2023

Preprint

View full text Add to dashboard Cite

<p>Modern speech recognition systems exhibits rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where diversity of training data is limited.</p> <p>In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision. We find that including source domain self-supervision stabilizes training and avoids mode collapse of the latent representations. For evaluation, we collect HParl, a 120 hour speech corpus for Greek, consisting of plenary sessions in the Greek Parliament. We merge HParl with two popular Greek corpora to create GREC-MD, a test-bed for multi-domain evaluation of Greek ASR systems. In our experiments we find that, while other Unsupervised Domain Adaptation baselines fail in this resource-constrained environment, M2DS2 yields significant improvements for cross-domain adaptation, even when a only a few hours of in-domain audio are available. When we relax the problem in a weakly supervised setting, we find that independent adaptation for audio using M2DS2 and language using simple LM augmentation techniques is particularly effective, yielding word error rates comparable to the fully supervised baselines.</p>

show abstract

Section: A Collection and Curation Of Hparlmentioning

confidence: 99%

“…2) Logotypografia: Logotypografia [66] is one of the first corpora for Large Vocabulary Continuous Speech Recognition in Greek. The dataset contains 33, 136 newscast utterances, or 72 hours of speech.…”

Section: B Including Corpora From Different Domainsmentioning

confidence: 99%