Many multimedia applications and entertainment industry products like games, cartoons and film dubbing require speech driven face animation and audio-video synchronization. Only Automatic Speech Recognition system (ASR) does not give good results in noisy environment. Audio Visual Speech Recognition system plays vital role in such harsh environment as it uses both -audio and visual -information. In this paper, we have proposed a novel approach with enhanced performance over traditional methods that have been reported so far. Our algorithm works on the bases of acoustic and visual parameters to achieve better results. We have tested our system for English language using LPC, MFCC and PLP parameters of the speech. Lip parameters like lip width, lip height etc are extracted from the video and these both acoustic and visual parameters are used to train systems like Artificial Neural Network (ANN), Vector Quantization (VQ), Dynamic Time Warping (DTW), Support Vector Machine (SVM). We have employed neural network in our research work with LPC, MFCC and PLP parameters. Results show that our system is giving very good response against tested vowels.