“…Based on recent advances in machine learning-based technologies, the conversion of biosignals to speech signals has been reported in several studies [3], [4], [5]. Various signals have been considered for speech generation and enhancement, including surface electromyography (sEMG) [3], [6], electromagnetic articulography (EMA) [4], [7], permanent magnetic articulography (PMA) [5], [8], ultrasound tongue imaging [9], [10], Doppler signals [11], [12], visual cues [13], [14], and bone-conducted microphone signals [15]. Further, multimodal learning has been leveraged to integrate information from complementary data, such as text [16], videos [13], boneconducted microphone signals [15], and articulatory movements [4].However, the transformation of articulatory movements to facilitate communication has not yet been adequately researched.…”