Despite the fact that there are many applications for analyzing and recreating the audio through existinglip movement recognition, the researchers have shown the interest in developing the automatic lip-readingsystems to achieve the increased performance. Modelling of the framework has been playing a major role inadvance yield of sequential framework. In recent years there have been lot of interest in Deep Neural Networks(DNN) and break through results in various domains including Image Classification, Speech Recognition andNatural Language Processing. To represents complex functions DNNs are used and also they play a vital rolein Automatic Lip Reading (ALR) systems. This paper mainly focuses on the traditional pixel, shape and mixedfeature extractions and their improved technologies for lip reading recognitions. It highlights the mostimportant techniques and progression from end-to-end deep learning architectures that were evolved duringthe past decade. The investigation points out the voice-visual databases that are used for analyzing and trainthe system with the most common words and the count of speakers and the size, length of the language andtime duration. On the flip side, ALR systems developed were compared with their old-style systems. Thestatistical analysis is performed to recognize the characters or numerals and words or sentences in English andcompared their performances.