Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances

Saon, George; Chien, Jen‐Tzung

doi:10.1109/msp.2012.2197156

Cited by 110 publications

(46 citation statements)

References 82 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many papers were dedicated to presenting an overview of the advances in LVCSR: [4,5,6,7,8]. However, the scope of this paper focuses primarily on the systems architecture, the techniques used and the key issues.…”

Section: Introductionmentioning

confidence: 99%

Recent advances in LVCSR : A benchmark comparison of performances

Errattahi

2017

IJECE

View full text Add to dashboard Cite

Large Vocabulary Continuous Speech Recognition (LVCSR), which is characterized by a high variability of the speech, is the most challenging task in automatic speech recognition (ASR). Believing that the evaluation of ASR systems on relevant and common speech corpora is one of the key factors that help accelerating research, we present, in this paper, a benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks. Furthermore, we put objectively into evidence the best performing technologies and the best accuracy achieved so far in each task. The benchmarks have shown that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LVCSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models. They have also shown that despite the satisfying performances in some LVCSR tasks, the problem of large-vocabulary speech recognition is far from being solved in some others, where more research efforts are still needed.Copyright c 2017 Institute of Advanced Engineering and Science.All rights reserved.Corresponding Author: Rahhal Errattahi Laboratory of Information Technology, National School of Applied Sciences, University of Chouaib Doukkali, EL Jadida, Morocco errattahi.r@ucd.ac.ma INTRODUCTIONSpeech is a natural and fundamental communication vehicle which can be considered as one of the most appropriate media for human-machine interactions. The aim of Automatic Speech Recognition (ASR) systems is to convert a speech signal into a sequence of words either for text-based communication purposes or for device controlling. ASR is usually used when the keyboard becomes inconvenient such, for example, when our hands are busy or with limited mobility, when we are using the phone, we are in the dark, or we are moving around etc. ASR finds application in many different areas: dictation, meeting and lectures transcription, speech translation, voice-search, phone based services and others. Those systems are, in general, extremely dependent on the data used for training the models, configuration of front-ends etc. Hence a large part of system development usually involves investigations of appropriate configurations for a new domain, new training data, and new language.There are several tasks of speech recognition and the difference between these tasks rests mainly on: (i) the speech type (isolated or continuous speech), (ii) the speaker mode (speaker dependent or independent), (iii) the vocabulary size (small, medium or large) and (iv) the speaking style (read or spontaneous speech). Even though ASR has matured to the point of commercial applications, the Speaker Independent Large Vocabulary Continuous Speech Recognition tasks (commonly designed as LVCSR) pose a particular challenge to ASR technology developers. Three of the major problems that arise when LVCSR systems are being developed are: First speaker independent systems require a large amount of training data in order to cover speak...

show abstract

Section: Introductionmentioning

confidence: 99%

Recent advances in LVCSR : A benchmark comparison of performances

Errattahi

2017

IJECE

View full text Add to dashboard Cite

show abstract

“…Recent improvements in ASR techniques have led to high-accuracy speech recognition systems [3], [36]. Over the past 20 years in particular, model training techniques have gradually migrated from maximumlikelihood (ML) estimation approaches to discriminative training techniques [2], [26], [29], [38].…”

Section: Introductionmentioning

confidence: 99%

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones

Tachioka

Watanabe

Roux

et al. 2017

Journal of Information Processing

View full text Add to dashboard Cite

Reverberant and noisy automatic speech recognition (ASR) using distant stereo microphones is a very challenging, but desirable scenario for home-environment speech applications. This scenario can often provide prior knowledge such as physical information about the sound sources and the environment in advance, which may then be used to reduce the influence of the interference. We propose a method to enhance the binary masking algorithm by using prior distributions of the time difference of arrival. This paper also validates state-of-the-art ASR techniques that include various discriminative training and feature transformation methods. Furthermore, we develop an efficient method to combine discriminative language modeling and minimum Bayes risk decoding in the ASR post-processing stage. We also investigate the effectiveness of this method when used for reverberated and noisy ASR of deep neural networks (DNNs) as well when used in systems that combine multiple DNNs using different features. Experiments on the medium vocabulary sub-task of the second CHiME challenge show that the system submitted to the challenge achieved a 26.86% word error rate (WER), moreover, the DNN system with the discriminative training, speaker adaptation and system combination achieves a 20.40% WER.

show abstract

“…some speech recognizers works using MFCC features while others works in parallel using PLP features, or several HMM based recognizers are used with different training and most likely acoustic states search strategies implemented, etc.) [3], [4], [5]. In the case of Lithuanian voice command recognition hybrid approach is important also because it may potentially enable to use foreign language trained speech recognition engine adapted to recognize Lithuanian commands with the proprietary Lithuanian speech recognizer.…”

Section: Introductionmentioning

confidence: 99%

Recognition of Voice Commands Using Hybrid Approach

Rudzionis

Ratkevicius

Rudzionis

et al. 2013

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract.Computerized systems with voice user interfaces could save time and ease the work of healthcare practitioners. To achieve this goal voice user interface should be reliable (to recognize the commands with high enough accuracy) and properly designed (to be convenient for the user). The paper deals with hybrid approach implementation issues for the voice commands recognition. By the hybrid approach we assume the combination of several different recognition methods to achieve higher recognition accuracy. The experimental results show that most voice commands are recognized good enough but there is some set of voice commands which recognition is more complicated. In this paper the novel method is proposed for the combination of several recognition methods based on the Ripper algorithm. Experimental evaluation showed that this method allows achieve higher recognition accuracy than application of blind combination rule.

show abstract

Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances

Cited by 110 publications

References 82 publications

Recent advances in LVCSR : A benchmark comparison of performances

Recent advances in LVCSR : A benchmark comparison of performances

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones

Recognition of Voice Commands Using Hybrid Approach

Contact Info

Product

Resources

About