“…For the articulatory-to-acoustic conversion task, typically electromagnetic articulography [25], ultrasound tongue imaging [6,7], permanent magnetic articulography [11], surface electromyography [13], magnetic resonance imaging [5] or video of the lip movements [16,10,3,19,17,23,22,24] are used. Lip-to-speech synthesis can be solved in two different ways: 1) direct approach, meaning that speech is generated without an intermediate step from the input signal [16,10,3,19,17]; and 2) indirect approach, meaning that lip-to-text recognition is followed by text-to-speech synthesis [23,22,24].…”