Silent speech decoding is a novel application of the Brain–Computer Interface (BCI) based on articulatory neuromuscular activities, reducing difficulties in data acquirement and processing. In this paper, spatial features and decoders that can be used to recognize the neuromuscular signals are investigated. Surface electromyography (sEMG) data are recorded from human subjects in mimed speech situations. Specifically, we propose to utilize transfer learning and deep learning methods by transforming the sEMG data into spectrograms that contain abundant information in time and frequency domains and are regarded as channel-interactive. For transfer learning, a pre-trained model of Xception on the large image dataset is used for feature generation. Three deep learning methods, Multi-Layer Perception, Convolutional Neural Network and bidirectional Long Short-Term Memory, are then trained using the extracted features and evaluated for recognizing the articulatory muscles’ movements in our word set. The proposed decoders successfully recognized the silent speech and bidirectional Long Short-Term Memory achieved the best accuracy of 90%, outperforming the other two algorithms. Experimental results demonstrate the validity of spectrogram features and deep learning algorithms.
Based on surface electromyography (sEMG), a novel recognition method to distinguish six types of human primary taste sensations was developed, and the recognition accuracy was 74.46%. The sEMG signals were acquired under the stimuli of no taste substance, distilled vinegar, white granulated sugar, instant coffee powder, refined salt, and Ajinomoto. Then, signals were preprocessed with the following steps: sample augments, removal of trend items, high-pass filter, and adaptive power frequency notch. Signals were classified with random forest and the classifier gave a five-fold cross-validation accuracy of 74.46%, which manifested the feasibility of the recognition task. To further improve the model performance, we explored the impact of feature dimension, electrode distribution, and subject diversity. Accordingly, we provided an optimized feature combination that reduced the number of feature types from 21 to 4, a preferable selection of electrode positions that reduced the number of channels from 6 to 4, and an analysis of the relation between subject diversity and model performance. This study provides guidance for further research on taste sensation recognition with sEMG.
Silent speech decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of brain–computer interfaces (BCIs) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimized sequence-to-sequence (Seq2Seq) approach to synthesize voice from the sEMG-based silent speech. We extract duration information to regulate the sEMG-based silent speech using the audio length. Then, we provide a deep-learning model with an encoder–decoder structure and a state-of-the-art vocoder to generate the audio waveform. Experiments based on six Mandarin Chinese speakers demonstrate that the proposed model can successfully decode silent speech in Mandarin Chinese and achieve a character error rate (CER) of 6.41% on average with human evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.