2014
DOI: 10.5937/telfor1402109s
|View full text |Cite
|
Sign up to set email alerts
|

Building a speech repository for a Serbian LVCSR system

Abstract: This paper describes the procedure of collecting speech and corresponding textual data and the processing needed to create a repository for training a LVCSR system for the Serbian language. The speech database for Serbian consists of speech recordings from audio books, radio programmes and talk shows, as well as read utterances from an array of male and female speakers. Currently, approximately 200 hours of speech recordings are collected, together with corresponding orthographic transcriptions which contain a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 2 publications
0
4
0
Order By: Relevance
“…The first part contains audio book recordings, recorded in studio environment by professional speakers. A large part of this database was already mentioned in previous papers [14], but lately it has been expanded by several new audio books. This database part is responsible for 168 hours of data in total, out of which about 140 hours is pure speech (the rest is silence).…”
Section: Methodsmentioning
confidence: 99%
“…The first part contains audio book recordings, recorded in studio environment by professional speakers. A large part of this database was already mentioned in previous papers [14], but lately it has been expanded by several new audio books. This database part is responsible for 168 hours of data in total, out of which about 140 hours is pure speech (the rest is silence).…”
Section: Methodsmentioning
confidence: 99%
“…This needs to be kept in mind when analysing WER results for different functional styles. All audio recordings were sampled at 16 kHz, 16 bits per sample, mono PCM [26].…”
Section: Word Error Rate Evaluationmentioning
confidence: 99%
“…regarding the Torlak dialect (Vuković, 2021), resources for automatic speech recognition and synthesis (Delić et al, 2013;Suzić et al, 2014), and specialised spoken corpora, such as the SCECL 1 corpus on early child language (Anđelković et al, 2001) and SrMaCo 2 corpus on language of Serbian minority in Hungary.…”
Section: Introductionmentioning
confidence: 99%