Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition

Upadhyaya, Prashant; Farooq, Omar; Abidi, Musiur Raza; Varshney, Priyanka

doi:10.1515/aoa-2015-0061

Cited by 16 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mel Filter Bank is used to convert sound signals in the frequency domain to the frequency domain mel and shows several energy quantities in the frequency range available in each filter mel. The approach in calculating mel is in frequency f (Hz) in Equation 4 [8].…”

Section: Mel Filter Bankmentioning

confidence: 99%

See 1 more Smart Citation

Pattern Recognition Bird Sounds Based on Their Type Using Discreate Cosine Transform (DCT) and Gaussian Methods

Nugroho

Widodo

Rachman

2019

KINETIK

View full text Add to dashboard Cite

To know the type of bird, most people know from the shape of bird species and the sound of birds. In this study, it identified the pattern of bird sounds. The bird sounds studied were Canary Trills, Vulture and Crow birds. In the introduction of the type of bird sound pattern in this study using the Discrete Cosine Transform (DCT) method and Gaussian value. The researcher conducted several steps to get the sound model of birds, among others, namely (1) bird sound input in the form of WAV file, (2) Hamming Windowing, (3) DFT / FFT, (4) Mel Bank Filter, (5) DCT, and (6) Value Gaussian. The output obtained is in the form of vector values and represented in graphical form. The results obtained in the study of pattern recognition of bird sound types get the results of observations in the same bird sound duration and frequency of the same, then the same pattern is obtained in the same bird as evidenced by calculating the closest distance value with Bray Curtis method. For the same duration of time and the length of the frequency that is not the same; it found that the pattern of bird sounds is not the same.

show abstract

Section: Mel Filter Bankmentioning

confidence: 99%

“…The researcher removes the DCT coefficient value to zero even though it indicates the value of the frame signal [8]. Elimination of zero costs because previous studies conducted are not reliable on speech recognition [10].…”

Section: Discrete Cosine Transform (Dct)mentioning

confidence: 99%

Pattern Recognition Bird Sounds Based on Their Type Using Discreate Cosine Transform (DCT) and Gaussian Methods

Nugroho

Widodo

Rachman

2019

KINETIK

View full text Add to dashboard Cite

show abstract

“…The greater part of the systems utilized in this sort of highlights uses hued lips or other stamping [6], anyway this methodology is a long way from this present reality circumstances. Procedures for programmed lip form extraction have been proposed by a few creators, however to-date these have met with restricted achievement [7] and an elective methodology is to receive appearance based features extractions strategies, which can be apply reasonable change of the mouth region of interest pursued by dimensionality decrease systems, for example linear discriminant analysis as well as principle component-analysis [8]. This article writes about a correlation of visual highlights separated utilizing an appearance methodology.…”

Section: Introductionmentioning

confidence: 99%

An Assessment of the Visual Features Extractions for the Audio-Visual Speech Recognition

Mohmand¹,

Perbandaran²

2019

IJATCSE

View full text Add to dashboard Cite

Utilization of the visual data from the speakers mouth region has appeared to develop presentation of the Automatic Speech-Recognition ASR frameworks. This is the particularly valuable in nearness of the clamor, which uniform in the moderate structure seriously debases discourse acknowledgment execution of frameworks utilizing just sound data. Different arrangements of highlights separated from speakers mouth area have been utilized to improve the showing of an ASR framework. In such testing situations and have met various triumphs, and to the best of creators information, the impact of utilizing these methods on the acknowledgment execution based on the phonemes have not been examined at this point. This paper presents examination of phoneme acknowledgement execution utilising visual highlights removed from mouth area of-enthusiasm utilising discrete cosine transform and discrete wavelet transform. Therefore, new discrete cosine transform and discrete wavelet transform feature have likewise been extricated and contrasted and the recently utilized one. These highlights were utilized alongside sound highlights dependent on the Mel-Frequency Cepstral Coefficients MFCCs. This recent research will help in the choosing appropriate feature for various application as well as distinguish the restrictions of these techniques in the acknowledgment of the individual-phonemes.

show abstract

Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network

Upadhyaya

Mittal

Farooq

et al. 2018

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition

Cited by 16 publications

References 14 publications

Pattern Recognition Bird Sounds Based on Their Type Using Discreate Cosine Transform (DCT) and Gaussian Methods

Pattern Recognition Bird Sounds Based on Their Type Using Discreate Cosine Transform (DCT) and Gaussian Methods

An Assessment of the Visual Features Extractions for the Audio-Visual Speech Recognition

Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network

Contact Info

Product

Resources

About