Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
In this technological era, smart and intelligent systems that are integrated with artificial intelligence (AI) techniques, algorithms, tools, and technologies, have impact on various aspects of our daily life. Communication and interaction between human and machine using speech becomes increasingly important, since it is an obvious substitute for keyboards and screens in the communication process. Therefore, numerous technologies take advantage of speech such as Automatic Speech Recognition (ASR), where human natural speech for many languages is used as the mean to interact with machines. Majority of the related works for ASR concentrate on the development and evaluation of ASR systems that serve a single language (monolingual) only, such as Arabic, English, Chinese, French, and many others. However, research attempts that combine multiple languages (bilingual and multilingual) during the development and evaluation of ASR systems are very limited. This paper aims to provide comprehensive research background and fundamentals of bilingual ASR, and related works that have combined two languages for ASR tasks from 2010 through 2021. It also formulates research taxonomy and discusses open challenges to bilingual ASR research. Based on our literature investigation, it is clear that bilingual ASR using deep learning approach is highly demanded and is able to provide acceptable performance. In addition, many combinations of two languages such as Arabic-English, Arabic-Malay, and others, are not attempted yet by the research community, which can open new research opportunities. Finally, it is clear that ASR research is moving towards not only bilingual ASR, but also multilingual ASR.
In this technological era, smart and intelligent systems that are integrated with artificial intelligence (AI) techniques, algorithms, tools, and technologies, have impact on various aspects of our daily life. Communication and interaction between human and machine using speech becomes increasingly important, since it is an obvious substitute for keyboards and screens in the communication process. Therefore, numerous technologies take advantage of speech such as Automatic Speech Recognition (ASR), where human natural speech for many languages is used as the mean to interact with machines. Majority of the related works for ASR concentrate on the development and evaluation of ASR systems that serve a single language (monolingual) only, such as Arabic, English, Chinese, French, and many others. However, research attempts that combine multiple languages (bilingual and multilingual) during the development and evaluation of ASR systems are very limited. This paper aims to provide comprehensive research background and fundamentals of bilingual ASR, and related works that have combined two languages for ASR tasks from 2010 through 2021. It also formulates research taxonomy and discusses open challenges to bilingual ASR research. Based on our literature investigation, it is clear that bilingual ASR using deep learning approach is highly demanded and is able to provide acceptable performance. In addition, many combinations of two languages such as Arabic-English, Arabic-Malay, and others, are not attempted yet by the research community, which can open new research opportunities. Finally, it is clear that ASR research is moving towards not only bilingual ASR, but also multilingual ASR.
This paper presents a method to improve a language model for a limited-resourced language using statistical machine translation from a related language to generate data for the target language. In this work, the machine translation model is trained on a corpus of parallel Mandarin-Cantonese subtitles and used to translate a large set of Mandarin conversational telephone transcripts to Cantonese, which has limited resources. The translated transcripts are used to train a more robust language model for speech recognition and for keyword search in Cantonese conversational telephone speech. This method enables the keyword search system to detect 1.5 times more out-of-vocabulary words, and achieve 1.7% absolute improvement on actual term-weighted value.
Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related phonopathies, most approaches available are targeted to a specific pathology. This may improve their accuracy for some users, but makes them unsuitable for others. In this paper, we present a cross-lingual approach to adapt a general-purpose modular speech recognizer for learning disabled people. The main advantage of this approach is that it allows rapid and cost-effective development by taking the already built speech recognition engine and its modules, and utilizing existing resources for standard speech in different languages for the recognition of the users' atypical voices. Although the recognizers built with the proposed technique obtain lower accuracy rates than those trained for specific pathologies, they can be used by a wide population and developed more rapidly, which makes it possible to design various types of speech-based applications accessible to learning disabled users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.