Speaker- and language-independent speech recognition in mobile communication systems

Viikki, I.; Kiss, Imre; Tian, Jilei

doi:10.1109/icassp.2001.940753

Cited by 13 publications

(3 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, it has been proven that taking the maximum of all of the mixtures, instead of the sum, is a very good approximation of the result [17]. Hence, by taking the negative of the logarithm of (1) in order to convert probabilities into costs and applying the previous approximation, the acoustic cost can be evaluated by where α is a coefficient per mixture that encompasses all constants and parameters outside of the exponential function in (1). In order to further simplify the computation of the Gaussian block, the variance σ 2 is replaced by a new variable v as described in (3).…”

Section: Gaussian Calculationmentioning

confidence: 99%

“…Large vocabulary speaker independent systems have potential in all forms of computing, from hand held mobile devices to personal computing and even large scale data centres. A low power, real-time embedded system could dramatically impact our daily interactions with digital mobile technology [1] while a faster than real-time multi-stream batch decoder could be used in server applications for distributed systems [2] or data-mining [3,4].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Veitch

Aubert

Woods

et al. 2011

International Journal of Reconfigurable Computing

View full text Add to dashboard Cite

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133 MHz.

show abstract

Section: Gaussian Calculationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Veitch

Aubert

Woods

et al. 2011

International Journal of Reconfigurable Computing

View full text Add to dashboard Cite

show abstract

“…Due to globalization as well as the international nature of the markets and the future applications, speaker independence implies the development and use of language independent automatic speaker recognition to avoid logistic difficulties. Hence, they proposed architecture for embedded multilingual speech recognition systems [17]. Rama Murty and Yegnanarayana [8] combined the evidences from the residual phase and MFCC methods used for speaker recognition and obtained very good results.…”

Section: Introductionmentioning

confidence: 99%

Combination of Features for Multilingual Speaker Identification with the Constraint of Limited Data

B.G.¹,

Jayanna²

2013

IJCA

View full text Add to dashboard Cite

In the modern day digital automated world, speaker identification system plays a very important role in the field of fast growing internet based communications/transactions. In this paper, speaker identification in the context of mono, cross and multilingual are demonstrated using the two different feature extraction techniques, i.e., Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) with the constraint of limited data. The languages considered for the study are English (international language), Hindi (national language) and Kannada (regional language). Since the standard multilingual database is not available, experiments are carried out on our own created database of 30 speakers in the college laboratory environment who can speak the three different languages. In case of limited data condition, owing to less data the existing techniques in each stage may not provide good performance. To alleviate the problem of limited data, the vocal tract feature extracted from MFCC and LPCC techniques are combined. As a result the combination of features gives nearly 30% higher performance compared to the individual features for a set of 30 speakers.

show abstract

ASR in portable wireless devices

Viikki

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

View full text Add to dashboard Cite

Speaker- and language-independent speech recognition in mobile communication systems

Cited by 13 publications

References 6 publications

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Combination of Features for Multilingual Speaker Identification with the Constraint of Limited Data

ASR in portable wireless devices

Contact Info

Product

Resources

About