2011
DOI: 10.1007/s10579-011-9152-1
|View full text |Cite
|
Sign up to set email alerts
|

Collecting and evaluating speech recognition corpora for 11 South African languages

Abstract: We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which contains data from the eleven official languages of South Africa. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were deve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(11 citation statements)
references
References 18 publications
0
11
0
Order By: Relevance
“…This is especially useful for underresourced languages lacking a substantial set of training data for a complete ASR system (Badenhorst et al, 2011;Livescu et al, 2012;Li, 2008a, 2008b). The accuracy of this strategy depends closely on the fit between the phone set used to build the aligner and the phone set of the target language.…”
Section: A Forced Alignmentmentioning
confidence: 99%
See 1 more Smart Citation
“…This is especially useful for underresourced languages lacking a substantial set of training data for a complete ASR system (Badenhorst et al, 2011;Livescu et al, 2012;Li, 2008a, 2008b). The accuracy of this strategy depends closely on the fit between the phone set used to build the aligner and the phone set of the target language.…”
Section: A Forced Alignmentmentioning
confidence: 99%
“…Second, forced alignment systems are often trained on large corpora (often between ten and hundreds of hours of speech) from a large number of speakers, e.g., between 40 (Malfrère et al, 2003) and 630 speakers (Garofolo et al, 1993). Even an automatic speech recognition system (ASR) built for under-resourced languages typically involves recordings from hundreds of speakers (Badenhorst et al, 2011). Transcribed corpora from endangered language documentation projects typically come from a handful of speakers and typically consist of between 5-60 h of recordings.…”
Section: Introductionmentioning
confidence: 99%
“…Initial projects were funded by the Department of Arts, Culture, Science and Technology (DACST) and subsequently by the Departments of Arts and Culture (DAC) and Science and Technology (DST), respectively, after the two departments became separate entities. For instance, the African Speech Technology (AST) project [1] was supported by DACST, while DAC funded projects like Lwazi [2,3] and the National Centre for Human Language Technology (NCHLT) speech [4,5] and text [6] projects. The recently-established South African Centre for Digital Language Resources (https://www.sadilar.…”
Section: Introductionmentioning
confidence: 99%
“…In South Africa, the AST, Lwazi, and the first NCHLT project relied on data collection to create speech resources for the indigenous languages. During the Lwazi project, telephone speech was collected (between four and ten hours per language [2]), while the aim of the first NCHLT project was to collect 50-60 h of orthographically-transcribed, broadband speech in each of the country's 11 official languages [4].…”
Section: Introductionmentioning
confidence: 99%
“…The Lwazi project was set up to develop a telephone-based speech-driven information system in South Africa. In the project, the Lwazi ASR corpus [77] words, dates, and numbers. A 5,000-word pronunciation dictionary was also created for each language, which covers only the most common words in the language.…”
Section: The Lwazi Speech Corpusmentioning
confidence: 99%