“…i.e., such speech recognizer should extract features from the facial parts, especially lip movement, and facial expressions. Several attempts have been made to detect speech from the lip movement by (Almajai, Cox, Harvey, & Lan, 2016;Chung, Senior, Vinyals, & Zisserman, 2017;Dupont & Luettin, 2000;Petridis & Pantic, 2016;Sui, Bennamoun, & Togneri, 2015;Wand, Koutnik, & Schmidhuber, 2016;Yau, Kumar, & Weghorn, 2007;Zhou, Zhao, Hong, & Pietikäinen, 2014), but the results are much low as compared with audio speech recognizers (Fernandez-Lopez & Sukno, 2018).…”