An Assessment of the Visual Features Extractions for the Audio-Visual Speech Recognition

Mohmand, Muhammad Ismail; Perbandaran, off Jalan

doi:10.30534/ijatcse/2019/27852019

Cited by 1 publication

(1 citation statement)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various examinations have uncovered that the data contained in discourse sign is firmly identified with that found in lip moments and if data in regards to the last is incorporated the observation execution of the two people and machines can be improved. In uproarious situations, people can diminish discourse acknowledgment blunders by utilizing the speakers lip moments [1] and to be sure numerous individuals with hearing challenges depend on lip perusing to give most of the discourse data they get. There are two fundamental issues that should be tended to in planning and actualizing a lip-reading framework and the first decision of visual features while the second to the improvement of an incredible procedure for the features extractions from the video streams.…”

Section: Introductionmentioning

confidence: 99%

The Geometrical Based Lip-Reading Techniques of Multi-Dimensional Dynamic Time Warping MDTW and Hidden Markov Models HMMs in the Audio Visual Speech Recognition

Mohmand¹

2020

IJATCSE

Self Cite

View full text Add to dashboard Cite

This paper portrays a programmed lip perusing framework comprising of the two primary modules such as, a pre-preparing module ready to separate lipgeometrical data from video grouping and a characterization module to recognize visual discourse dependent on unique lip-developments. The recognitions execution of the planned framework has been evaluated in the acknowledgment of English digits staring from 0 to 9 spoken by the speaker's in the video groupings accessible in Clemson University Audio Visual Experiments (CUAVE) database techniques. The extractions of lip-geometrical features was completed utilizing a blend of skinshading channel, an outskirt following calculation and an arched Hull-approach as well as proposed strategy was contrasted and the well-known 'snake' procedure and was found to expand the lip-shape extractions execution for database are considered. The lip-geometrical features including stature, width, proportion, territory, edge as well as different mixes of features were assessed to the figure out which plays by the speaking to discourse in the visual area in the use of three discrete arrangement techniques, in particular optical stream, Dynamic Time Warping (DTW) and another methodology named Multidimensional DTW and Hidden Markov Model (HMM). The experiments shows that the proposed framework is fit for an acknowledgment execution of 74% simply utilizing lip stature, conventional appearance-based Discrete Cosine Transform DCT techniques of the lip-width and proportion of these features exhibiting that framework can possibly be fused in a multimodal discourse recognitions framework for the use of energetic environments.

show abstract

Section: Introductionmentioning

confidence: 99%