ASR corpus design for resource-scarce languages

Barnard, Etienne; Davel, Marelie H.; Heerden, Charl Johannes van

doi:10.21437/interspeech.2009-727

Cited by 35 publications

(14 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The value of η d is empirically selected based on the performance of the system indicated by the Equal Error Rate (EER). The EER value indicates the operating point where the system's false acceptance rate is equal to its false rejection rate (Beigi, 2011). η d of 30dB is found to present the best performance with the lowest EER.…”

Section: Data Augmentation For the Establishment Of The I-vector Systemmentioning

confidence: 97%

“…This in turn is realised using autocorrelation (Broersen, 2006). LPCC are calculated using a recursive process (Beigi, 2011).…”

Section: Multitaper-fitted Lpccmentioning

confidence: 99%

“…Speaker recognition is becoming widely used for different applications, e.g. access control (security), audio indexing and forensic applications (Beigi, 2011). The front-end of a speaker recognition system is important because it can greatly affect the overall system performance.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Speaker recognition using PCA-based feature transformation

Ahmed

Chiverton

Ndzi

et al. 2019

Speech Communication

View full text Add to dashboard Cite

Section: Data Augmentation For the Establishment Of The I-vector Systemmentioning

confidence: 97%

“…This in turn is realised using autocorrelation (Broersen, 2006). LPCC are calculated using a recursive process (Beigi, 2011).…”

Section: Multitaper-fitted Lpccmentioning

confidence: 99%

See 1 more Smart Citation

Speaker recognition using PCA-based feature transformation

Ahmed

Chiverton

Ndzi

et al. 2019

Speech Communication

View full text Add to dashboard Cite

“…The word "phone" was coined in [5] as an abbreviation for "phonetic symbol," defined in [6] as an element of a phonetic transcription corresponding to at most one phoneme, whose boundary times in the acoustic signal can be reliably identified using automatic forced alignment. Such language-dependent ASR segment inventories may be expressed using the language-independent symbols of the IPA [7], and their set union defines a language-independent phone inventory, which may be trained using multilingual data [8]; alternatively, language-dependent phone models may be trained using far less data than language-dependent word models, because the number of phones in a language is far fewer than the number of words [9]. In order to use phone-based acoustic models, however, it is necessary to discover the phone inventory of the unseen language.…”

Section: Introductionmentioning

confidence: 99%

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

Żelasko¹,

Feng²,

Moro-Velázquez³

et al. 2022

Preprint

View full text Add to dashboard Cite

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language.The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we 1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that

show abstract

“…The limited availability of speech corpora is a major constraint on the development of automatic speech recognition (ASR) in under-resourced languages and dialects [1,2]. Consequently, there is significant interest in ways to develop such corpora efficiently [3], and the efficient exploitation of limited corpora [1,4].…”

Section: Introductionmentioning

confidence: 99%