Automatic speech recognition (ASR) systems require large amounts of transcribed speech data, for training state-of-theart deep neural network (DNN) acoustic models. Transcribed speech is a scarce and expensive resource, and ASR systems are prone to underperform in domains where there is not a lot of training data available. In this work, we open up a vast and previously unused resource of transcribed speech for Finnish, by retrieving and aligning all the recordings and meeting transcripts from the web portal of the Parliament of Finland. Short speech-text segment pairs are retrieved from the audio and text material, by using the Levenshtein algorithm to align the firstpass ASR hypotheses with the corresponding meeting transcripts. DNN acoustic models are trained on the automatically constructed corpus, and performance is compared to other models trained on a commercially available speech corpus. Model performance is evaluated on Finnish parliament speech, by dividing the testing set into seen and unseen speakers. Performance is also evaluated on broadcast speech to test the general applicability of the parliament speech corpus. We also study the use of meeting transcripts in language model adaptation, to achieve additional gains in speech recognition accuracy of Finnish parliament speech.
One particular problem in large vocabulary continuous speech recognition for low-resourced languages is finding relevant training data for the statistical language models. Large amount of data is required, because models should estimate the probability for all possible word sequences. For Finnish, Estonian and the other fenno-ugric languages a special problem with the data is the huge amount of different word forms that are common in normal speech. The same problem exists also in other language technology applications such as machine translation, information retrieval, and in some extent also in other morphologically rich languages. In this paper we present methods and evaluations in four recent language modeling topics: selecting conversational data from the Internet, adapting models for foreign words, multi-domain and adapted neural network language modeling, and decoding with subword units. Our evaluations show that the same methods work in more than one language and that they scale down to smaller data resources.
Abstract. Automatic Speech Recognition (ASR) field has improved substantially in the last years. We are in a point never saw before, where we can apply such algorithms in non-ideal conditions such as real classrooms. In these scenarios it is still not possible to reach perfect recognition rates, however we can already take advantage of these improvements. This paper shows preliminary results using ASR in Chilean and Finnish middle and high school to automatically provide teachers a visualization of the structure of concepts present in their discourse in science classrooms. These visualizations are conceptual networks that relate key concepts used by the teacher. This is an interesting tool that gives feedback to the teacher about his/her pedagogical practice in classes. The result of initial comparisons shows great similarity between conceptual networks generated in a manual way with those generated automatically.
In this paper, we improve morph-based speech recognition system by focusing adaptation efforts on acronyms (ACRs) and foreign proper names (FPNs). An unsupervised language model (LM) adaptation framework based on two-pass decoding is used. Vocabulary adaptation is applied alongside unsupervised LM adaptation. The aim is to improve both language and pronunciation modeling for FPNs and ACRs. A smart selection algorithm is used to find the most likely topically related foreign words and acronyms from in-domain text. New pronunciation rules are generated for the selected words. Different kinds of morpheme adaptation operations are also evaluated on the ACR and FPN candidate words, to ensure optimal results are gained from pronunciation adaptation. Statistically significant improvements in average word error rate (WER), and term error rate (TER), are achieved using a combination of unsupervised LM adaptation with vocabulary adaptation focused on ACRs and FPNs.Index Terms-Foreign word detection, morph-based speech recognition, out-of-vocabulary (OOV) recognition, unsupervised language model (LM) adaptation.
Improving the performance of distant speech recognition is of considerable current interest, driven by a desire to bring speech recognition into people's homes. Standard approaches to this task aim to enhance the signal prior to recognition, typically using beamforming techniques on multiple channels. Only few real-world recordings are available that allow experimentation with such techniques. This has become even more pertinent with recent works with deep neural networks aiming to learn beamforming from data. Such approaches require large multichannel training sets, ideally with location annotation for moving speakers, which is scarce in existing corpora. This paper presents a freely available and new extended corpus of English speech recordings in a natural setting, with moving speakers. The data is recorded with diverse microphone arrays, and uniquely, with ground truth location tracking. It extends the 8.0 hour Sheffield Wargames Corpus released in Interspeech 2013, with a further 16.6 hours of fully annotated data, including 6.1 hours of female speech to improve gender bias. Additional blog-based language model data is provided alongside, as well as a Kaldi baseline system. Results are reported with a standard Kaldi configuration, and a baseline meeting recognition system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.