“…In contrast, shape based feature extraction assumes that most speechreading information is contained in the contours of the speaker's lips, or more generally in the face contours, e.g., jaw and cheek shape, in addition to the lips [48]. Within this category belong geometric type features, such as mouth height, width, and area [19], [22], [26], [28], [29], [32][33][34][35], [49][50][51][52], Fourier and image moment descriptors of the lip contours [28], [53], statistical models of shape, such as active shape models [48], [54], or other parameters of lip-tracking models [44], [55][56][57]. Finally, features from both categories can be concatenated into a joint shape and appearance vector [27], [44], [58], [59], or a joint statistical model can be learned on such vectors, as is the case of the active appearance model [60], used for speechreading in [48].…”