Multilingual Speech Recognition

Nöth, Elmar; Harbeck, Stefan; Niemann, Heinrich

doi:10.1007/978-3-642-60087-6_31

Cited by 5 publications

(1 citation statement)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hence, the discrimination between each such two languages become challenging for language recognition systems. The applications of automatic language recognition evidently appear in spoken language translation [27], multilingual speech recognition [28], and spoken document retrieval [29].…”

Section: Background a Speech Signalsmentioning

confidence: 99%

Speech Recognition Using Deep Neural Networks: A Systematic Review

et al. 2019

View full text Add to dashboard Cite

Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research. This paper provides a thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications. A thorough statistical analysis is provided in this review which was conducted by extracting specific information from 174 papers published between the years 2006 and 2018. The results provided in this paper shed light on the trends of research in this area as well as bring focus to new research topics. INDEX TERMS Speech recognition, deep neural network, systematic review.

show abstract

Section: Background a Speech Signalsmentioning

confidence: 99%

Speech Recognition Using Deep Neural Networks: A Systematic Review

et al. 2019

View full text Add to dashboard Cite

show abstract

A scalable architecture for multilingual speech recognition on embedded devices

Raab

Gruhn

Nöth

2011

Speech Communication

View full text Add to dashboard Cite

Please cite this article as: Raab, M., Gruhn, R., Nöth, E., A scalable architecture for multilingual speech recognition on embedded devices, Speech Communication (2010), doi: 10.1016/j.specom. 2010.07.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. AbstractIn-car infotainment and navigation devices are typical examples where speech based interfaces are successfully applied. While classical applications are monolingual, such as voice commands or monolingual destination input, the trend goes towards multilingual applications. Examples are music player control or multilingual destination input. As soon as more languages are considered the training and decoding complexity of the speech recognizer increases. For large multilingual systems, some kind of parameter tying is needed to keep the decoding task feasible on embedded systems with limited resources. A traditional technique for this is to use a semi-continuous Hidden Markov Model as the acoustic model. The monolingual codebook on which such a system relies is not appropriate for multilingual recognition. We introduce Multilingual Weighted Codebooks that give good results with low decoding complexity. These codebooks depend on the actual language combination and increase the training complexity. Therefore an algorithm is needed that can reduce the training complexity. Our first proposal are mathematically motivated projections between Hidden Markov Models defined in Gaussian spaces. Although theoretically optimal, these projections were difficult to employ directly in speech decoders. We found approximated projections to be most effective for practical application, giving good performance without requiring major modifications to the common speech recognizer architecture. With a combination of the Multilingual Weighted Codebooks and Gaussian Mixture Model projections we create an efficient and scalable architecture for non-native speech recognition. Our new architecture offers a solution to the combinatoric problems of training and decoding for multiple languages. It builds new multilingual systems in only 0.002% of the time of a traditional HMM training, and achieves comparable performance on foreign languages.

show abstract