Acoustic data acquisition for under-resourced languages is an important and challenging task. In the Icelandic parliament, Althingi, all performed speeches are transcribed manually and published as text on Althingi's web page. To reduce the manual work involved, an automatic speech recognition system is being developed for Althingi. In this paper the development of a speech corpus suitable for the training of a parliamentary ASR system is described. Text and audio data of manually transcribed speeches were processed to build an aligned, segmented corpus, whereby language specific tasks had to be developed specially for Icelandic. The resulting corpus of 542 hours of speech is freely available on http://www.malfong.is. First experiments with an ASR system trained on the Althingi corpus have been conducted, showing promising results. Word error rate of 16.38% was obtained using time-delay deep neural network (TD-DNN) and 14.76% was obtained using longshort term memory recurrent neural network (LSTM-RNN) architecture. The Althingi corpus is to our knowledge the largest speech corpus currently available in Icelandic. The corpus as well as the developed methods for corpus creation constitute a valuable resource for further developments within Icelandic language technology.
Building acoustic databases for speech recognition is very important for under-resourced languages. To build a speech recognition system, a large amount of speech data from a considerable number of participants needs to be collected. Eyra is a toolkit that can be used to gather acoustic data from a large number of participants in a relatively straight forward fashion. Predetermined prompts are downloaded onto a client, typically run on a smartphone, where the participant reads them aloud so that the recording and its corresponding prompt can be uploaded. This paper presents the Eyra toolkit, its quality control routines and annotation mechanism. The quality control relies on a forced-alignment module, which gives feedback to the participant, and an annotation module which allows data collectors to rate the read prompts after they are uploaded to the system. The paper presents an analysis of the performance of the quality control and describes two data collections for Icelandic and Javanese.
All performed speeches in the Icelandic parliament, Althingi, are transcribed and published. An automatic speech recognition system (ASR) has been developed to reduce the manual work involved. To our knowledge, this is the first open source speech recognizer in use for Icelandic. In this paper the development of the ASR is described. In-lab system performance is evaluated and first results from the users are described. A word error rate (WER) of 7.91% was obtained on our in-lab speech recognition test set using time-delay deep neural network (TDNN) and re-scoring with a bidirectional recurrent neural network language model (RNN-LM). No further processing of the text is included in that number. In-lab F-score for the punctuation model is 80.6 and 61.6 for the paragraph model. The WER of the ASR, including punctuation marks and other post-processing, was 15.0 ± 6.0%, over 625 speeches, when tested in the wild. This is an upper limit since not all mismatches with the reference text are true errors of the ASR. The transcribers of Althingi graded 77% of the speech transcripts as Good. The Althingi corpus and ASR recipe, constitute a valuable resource for further developments within Icelandic language technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.