Robust speech recognition by normalization of the acoustic space

Acero, Alejandro; Stern, Richard M.

doi:10.1109/icassp.1991.150483

Cited by 47 publications

(22 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The selection of warping function is sometimes accomplished by choosing from a set of candidate functions in a fashion that maximizes the likelihood of the observations, and sometimes directly on the basis of speaker-specific speech parameters. In a relatively early study, Acero blindly estimated the optimal frequency-distortion parameter for the bilinear transform to accomplish frequency warping for LPCderived cepstra (Acero 1993;Acero and Stern 1991). This technique produced 12% decrease in the relative error rate on the CMU speaker-independent alphanumeric census task.…”

Section: Estimation Of Warping Factormentioning

confidence: 98%

Time scale modification and vocal tract length normalization for improving the performance of Tamil speech recognition system implemented using language independent segmentation algorithm

Saraswathi

Geetha

2006

Int J Speech Technol

View full text Add to dashboard Cite

This paper describes the work done in improving the performance of Tamil speech recognition system by using Time Scale Modification (TSM) and Vocal Tract Length Normalization (VTLN) techniques. The speech recognition system for Tamil language was developed using a new approach of text independent speech segmentation, with a phoneme based language model for recognition. There is degradation in the performance of speech recognition due to variations in the speaking rate and vocal tract shape among different speakers. In order to improve the performance of speech recognition system, both TSM and VTLN normalization techniques were used in this work. The TSM was implemented using the Phase vocoder approach and the VTLN was implemented using speaker specific bark/mel scale in bark/mel domain. The performance of Tamil speech recognition system was improved by performing both TSM and VTLN normalization techniques.

show abstract

Section: Estimation Of Warping Factormentioning

confidence: 98%

Time scale modification and vocal tract length normalization for improving the performance of Tamil speech recognition system implemented using language independent segmentation algorithm

Saraswathi

Geetha

2006

Int J Speech Technol

View full text Add to dashboard Cite

show abstract

“…One of the early attempts to obtain a LT was by Acero et al (Acero 1990;Acero & Stern 1991). They proposed the use of bilinear warping for achieving variable frequency warping for speaker normalization.…”

Section: Review Of Existing Approaches To Obtain Ltmentioning

confidence: 99%

“…Motivated by the work of Acero (1990) and Acero & Stern (1991) and based on the observation that frequency warping functions used in most VTLN methods can be approximated to a reasonable degree by the bilinear transform, McDonough et al (1998) suggested the use of conformal maps such as bilinear transform and its generalizations for speaker normalization. Since the unit circle is mapped back onto the unit circle, McDonough refers to these conformal maps as all-pass systems; such systems have uniform frequency response and thus pass signals of all frequencies with neither attenuation nor amplification.…”

Section: Review Of Existing Approaches To Obtain Ltmentioning

confidence: 99%

Studies on inter-speaker variability in speech and its application in automatic speech recognition

Umesh

2011

Sadhana

View full text Add to dashboard Cite

In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing. We then address the problem of inter-speaker variability in automatic speech recognition (ASR) and describe techniques that are used to reduce these effects and thereby improve the performance of speaker-independent ASR systems.

show abstract

“…Accordingly, using different microphones in the training mode and the recognition mode causes performance degradation (32,34). Several methods have been proposed to cope with this problem (35,36).…”

Section: Robust Algorithmsmentioning

confidence: 99%

What does voice-processing technology support today?

Nakatsu¹,

Suzuki²

1995

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

This paper describes the state of the art in applications ofvoice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed.Several issues concerning evaluation of speech recognition/ synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions.Recently, voice-processing technology has been greatly improved. There is a large gap between the present voiceprocessing technology and that of 10 years ago. The speech recognition and synthesis market, however, has lagged far behind technological progress. This paper describes the state of the art in voice-processing technology applications and points out several problems concerning market growth that need to be solved.Technologies related to applications can be divided into two categories. One is system technologies and the other is speech recognition and synthesis algorithms.Hardware and software technologies are the main topics for system development. Hardware technologies are very important because any speech algorithm is destined for implementation on hardware. Technology in this area is advancing quickly. Almost all speech recognition/synthesis algorithms can be used with a microprocessor and several DSPs. With the progress of device technology and parallel architecture, hardware technology will continue to improve and will be able to cope with the huge number of calculations demanded by improved algorithms of the future. Also, software technologies are an important factor, as algorithms and application procedures should be implemented by the use of software technology. In this paper, therefore, software technology will be treated as an application development tool. Along with the growth areas of application of voice-processing technology, various architectures and tools that support applications development have been devised. Also, when speech processing is the application target, it is important to keep in mind the characteristics peculiar to speech. Speech communication basically is of a nature that it should work in a real-time interactive mode. Computer systems that handle speech communications with users should have an ability to cope with these operations. Several issues concerning real-time interactive communication will be described.For algorithms there are two important issues concerning application. One is the evaluation of algorithms, and the other is the robustness of algorithms under adverse conditions. Evaluation of speech recognition and synthesis algorithms has been one of the main topics in the research area. However, to consider applicatio...

show abstract

Robust speech recognition by normalization of the acoustic space

Cited by 47 publications

References 9 publications

Time scale modification and vocal tract length normalization for improving the performance of Tamil speech recognition system implemented using language independent segmentation algorithm

Time scale modification and vocal tract length normalization for improving the performance of Tamil speech recognition system implemented using language independent segmentation algorithm

Studies on inter-speaker variability in speech and its application in automatic speech recognition

What does voice-processing technology support today?

Contact Info

Product

Resources

About