“…Ideally, these interfaces would record the articulation and synthesize speech based on the movement of the organs -without the user of the device actually producing any sound. The typical input of AAM can be a video of the lip movements [3,4,5,6,7,8], ultrasound tongue imaging (UTI) [3,9,10,11,12,13,14,15,16,17], or several other modalities (e.g., MRI, EMA, PMA, EOS, radar, multimodal, etc.). All of the articulatory tracking devices are highly sensitive to 1) the alignment of the recording equipment across sessions, 2) the actual speaker's anatomy.…”