2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003804
|View full text |Cite
|
Sign up to set email alerts
|

Improving Fundamental Frequency Generation in EMG-to-Speech Conversion Using a Quantization Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…Silent Computational Paralinguistics (SCP) reveal paralinguistics for situations when audible acoustic signals are not available or advisable, e. g., due to privacy concerns or disturbance of others, adverse noise conditions, or speech pathologies. While SSIs have previously addressed Automatic Speech Recognition, e. g., from video, EMG, or ultrasound [5], or examined how to synthesize silent to audible speech, e. g., for laryngectomy patients [6,7,8], research on privacy for paralinguistic analysis has focused mostly on whispered speech [9,10,11]. Some research has explored EMG for emotion recognition [12], and facial expressions to enhance human-computer interaction [13] or human-robot interaction [14].…”
Section: Introductionmentioning
confidence: 99%
“…Silent Computational Paralinguistics (SCP) reveal paralinguistics for situations when audible acoustic signals are not available or advisable, e. g., due to privacy concerns or disturbance of others, adverse noise conditions, or speech pathologies. While SSIs have previously addressed Automatic Speech Recognition, e. g., from video, EMG, or ultrasound [5], or examined how to synthesize silent to audible speech, e. g., for laryngectomy patients [6,7,8], research on privacy for paralinguistic analysis has focused mostly on whispered speech [9,10,11]. Some research has explored EMG for emotion recognition [12], and facial expressions to enhance human-computer interaction [13] or human-robot interaction [14].…”
Section: Introductionmentioning
confidence: 99%
“…For example, Recurrent Neural Networks (RNNs) have been used to predict natural F 0 patterns for alaryngeal voices based on conventional spectral features in [28] and [8]. Similarly, Diener et al in [29] developed a prediction system based on electromyographic signals.…”
Section: B Vc-based Statistical F 0 Prediction and Voicing State Controlmentioning
confidence: 99%
“…In contrast to regression modeling where target F 0 values form a continuous output variable, in classification predictive modeling, output variables are discrete classes or levels. To put it differently, classification modeling turns the task of predicting a real value for F 0 into predicting the most-probable quantization level where the F 0 falls onto [29]. Therefore, by solving a multi-level classification problem, the mapping function from input features to discretized target F0 patterns can be approximated.…”
Section: ) Classification Predictive Modelingmentioning
confidence: 99%
See 1 more Smart Citation
“…Prosody is mainly conditioned by the airflow and the vibration of the vocal folds, which in the case of laryngectomised patients is not possible to recover. As a result, most direct synthesis techniques generating a voice from sensed articulatory movements can, at best, recover a monotonous voice with limited pitch variations [101], [281], [282]. The use of complementary information capable of restoring prosodic features is thus an important area for future research.…”
Section: A Improved Sensing Techniquesmentioning
confidence: 99%