A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Tanaka, Kou; Toda, Tomoki; Neubig, Graham; Sakti, Sakriani; Nakamura, Satoshi

doi:10.1587/transinf.e97.d.1429

Cited by 25 publications

(19 citation statements)

References 14 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…11. As reported in [9], we confirmed that Batch is significantly improved compared with EL by predicting F 0 patterns based on statistical F 0 patterns. For our proposed methods RT and Forthcoming, we achieved that two proposed systems caused no degradation compared with Batch.…”

Section: Naturalness Of Predicted F 0 Patternssupporting

confidence: 87%

“…We found that reducing the variability of F 0 patterns such as rapid movements, we achieved to train F 0 patterns with a smaller number of mixture components. Moreover, as reported in [9], we also confirmed that CF 0 brings better performance compared with the original F 0 because continuous sequence makes it possible to consider inter-frame correlation over an utterance. The proposed segmented CF 0 preserves such an improvement relatively well while minimizing degradation of the prediction accuracy.…”

Section: Best Number Of Mixture Componentssupporting

confidence: 87%

“…Note that after predicting CF 0 patterns over all frames, only silence frames are automatically detected by using waveform power [9].…”

Section: Batch-type Prediction Processmentioning

confidence: 99%

“…To improve naturalness of EL speech, we have proposed several EL speech enhanced methods based on statistical voice conversion techniques [7]- [9]. In these methods, acoustic features of EL speech are converted into those of normal speech using Gaussian mixture models (GMMs) [7]- [9].…”

Section: Introductionmentioning

confidence: 99%

“…In these methods, acoustic features of EL speech are converted into those of normal speech using Gaussian mixture models (GMMs) [7]- [9]. We have shown that F 0 pattern replacement from the mechanically generated ones into those predicted from the spectral sequence of the EL speech using the GMM significantly improves naturalness of EL speech while preserving its intelligibility [9]. On the other hand, the use of these enhancement methods needs to use a loudspeaker to present the enhanced EL speech.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction

Tanaka

Toda

Nakamura

2017

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

SUMMARY This paper presents a novel speaking aid system to help laryngectomees produce more naturally sounding electrolaryngeal (EL) speech. An electrolarynx is an external device to generate excitation signals, instead of vibration of the vocal folds. Although the conventional EL speech is quite intelligible, its naturalness suffers from the unnatural fundamental frequency (F 0 ) patterns of the mechanically generated excitation signals. To improve the naturalness of EL speech, we have proposed EL speech enhancement methods using statistical F 0 pattern prediction. In these methods, the original EL speech recorded by a microphone is presented from a loudspeaker after performing the speech enhancement. These methods are effective for some situation, such as telecommunication, but it is not suitable for face-to-face conversation because not only the enhanced EL speech but also the original EL speech is presented to listeners. In this paper, to develop an EL speech enhancement also effective for face-to-face conversation, we propose a method for directly controlling F 0 patterns of the excitation signals to be generated from the electrolarynx using the statistical F 0 prediction. To get an "actual feel" of the proposed system, we also implement a prototype system. By using the prototype system, we find latency issues caused by a real-time processing. To address these latency issues, we furthermore propose segmental continuous F 0 pattern modeling and forthcoming F 0 pattern modeling. With evaluations through simulation, we demonstrate that our proposed system is capable of effectively addressing the issues of latency and those of electrolarynx in term of the naturalness.

show abstract

Section: Naturalness Of Predicted F 0 Patternssupporting

confidence: 87%

Section: Best Number Of Mixture Componentssupporting

confidence: 87%

“…Note that after predicting CF 0 patterns over all frames, only silence frames are automatically detected by using waveform power [9].…”

Section: Batch-type Prediction Processmentioning

confidence: 99%