Speaker Dependent and Independent Isolated Hindi Word Recognizer using Hidden Markov Model (HMM)

Bhardwaj, Ishan; Londhe, Narendra D.

doi:10.5120/8217-1639

Cited by 3 publications

(3 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the phoneme has high or low resolution to the other class (interclass confusion), the visual information is complementary to the audio information due to the different shape of visemes for different classes [8,9]. However if the phoneme has confusion with the same viseme class (intraclass confusion) then visual information is not always useful because it has same viseme shape for same class.…”

Section: Introductionmentioning

confidence: 94%

See 1 more Smart Citation

Phoneme confusability reduction by using visual information in noisy environment

Varshney

Bansal

Farooq

2014

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014)

View full text Add to dashboard Cite

Robust speech recognition has been a prominent research area in the recent past. The important aspect of speech recognition system is phoneme identification. It is a well established fact that the performance of speech recognition system varies under different background conditions. Using visual information in speech recognition makes the system robust to the problems associated with acoustic noise. In this paper, an automated Audio Visual Phoneme Recognition (A VPR) system has been proposed and implemented for Hindi language. A set of fifty sentences is used to extract the samples of utterances of phoneme and corresponding viseme shape. Mel Frequency Cepstral coefficient (MFCC) based technique is used to form the feature set for audio signal. Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) are used to extract the visual information. Early integration technique is used to integrate the audio and visual feature set. Discrimination analysis based classifier is applied for the recognition of phonemes. To show the effect of interclass confusion associate in the viseme classes, the experiments are performed for 4 viseme classes and 8 viseme classes separately in clean and noisy background conditions. Visual information is utilized to decrease the effect of interclass confusion on phonemes. The overall maximum accuracy is 49.44% and 38.81 % for 4 and 8 viseme classesrespectively by using linear discrimination. It has been also established that an improvement of 2.91 % and 6.07% is obtained by integrating visual information along with audio signal at -10 dB Signal to Noise Ratio (SNR).

show abstract

Section: Introductionmentioning

confidence: 94%

“…The experiment has shown 79.11% recognition rate for the proposed system. Bhardwaj et al [8] has been prepared a dataset of 10 words which has been spoken by 10 speakers. MFCC and HMM are used for the feature extraction and classification respectively.…”

Section: Introductionmentioning

confidence: 99%

Phoneme confusability reduction by using visual information in noisy environment

Varshney

Bansal

Farooq

2014

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014)

View full text Add to dashboard Cite

show abstract

“…These attributes help identify the speaker and speech features [4]. Although speech recognition and speaker recognition are different fields, the feature extraction methods in both fields overlap [5]. These methods include predictive models based on the linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficient (MFCC) and relative spectra filtering (RASTA).…”

Section: Introductionmentioning

confidence: 99%

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Kaur¹,

Srivastava²,

Kumar³

2018

JTIT

View full text Add to dashboard Cite

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coeﬃcient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

show abstract

An Optimal Speech Recognition Module for Patient's Voice Monitoring System in Smart Healthcare Applications

Krishnaveni

Subashini

Gracy

et al. 2018

2018 Renewable Energies, Power Systems &Amp; Green Inclusive Economy (REPS-GIE)

View full text Add to dashboard Cite

Speaker Dependent and Independent Isolated Hindi Word Recognizer using Hidden Markov Model (HMM)

Cited by 3 publications

References 51 publications

Phoneme confusability reduction by using visual information in noisy environment

Phoneme confusability reduction by using visual information in noisy environment

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

An Optimal Speech Recognition Module for Patient's Voice Monitoring System in Smart Healthcare Applications

Contact Info

Product

Resources

About