2015
DOI: 10.1515/aoa-2015-0061
|View full text |Cite
|
Sign up to set email alerts
|

Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition

Abstract: In building speech recognition based applications, robustness to different noisy background condition is an important challenge. In this paper bimodal approach is proposed to improve the robustness of Hindi speech recognition system. Also an importance of different types of visual features is studied for audio visual automatic speech recognition (AVASR) system under diverse noisy audio conditions. Four sets of visual feature based on Two-Dimensional Discrete Cosine Transform feature (2D-DCT), Principal Compone… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…Mel Filter Bank is used to convert sound signals in the frequency domain to the frequency domain mel and shows several energy quantities in the frequency range available in each filter mel. The approach in calculating mel is in frequency f (Hz) in Equation 4 [8].…”
Section: Mel Filter Bankmentioning
confidence: 99%
See 1 more Smart Citation
“…Mel Filter Bank is used to convert sound signals in the frequency domain to the frequency domain mel and shows several energy quantities in the frequency range available in each filter mel. The approach in calculating mel is in frequency f (Hz) in Equation 4 [8].…”
Section: Mel Filter Bankmentioning
confidence: 99%
“…The researcher removes the DCT coefficient value to zero even though it indicates the value of the frame signal [8]. Elimination of zero costs because previous studies conducted are not reliable on speech recognition [10].…”
Section: Discrete Cosine Transform (Dct)mentioning
confidence: 99%
“…The greater part of the systems utilized in this sort of highlights uses hued lips or other stamping [6], anyway this methodology is a long way from this present reality circumstances. Procedures for programmed lip form extraction have been proposed by a few creators, however to-date these have met with restricted achievement [7] and an elective methodology is to receive appearance based features extractions strategies, which can be apply reasonable change of the mouth region of interest pursued by dimensionality decrease systems, for example linear discriminant analysis as well as principle component-analysis [8]. This article writes about a correlation of visual highlights separated utilizing an appearance methodology.…”
Section: Introductionmentioning
confidence: 99%