In this paper, we present our recent studies of F 0 estimation from the surface electromyographic (EMG) data using a Gaussian mixture model (GMM)-based voice conversion (VC) technique, referred to as EMG-to-F 0 . In our approach, a support vector machine recognizes individual frames as unvoiced and voiced (U/V), and voiced F 0 contours are discriminated by the trained GMM based on the manner of minimum mean-square error. EMG-to-F 0 is experimentally evaluated using three data sets of different speakers. Each data set includes almost 500 utterances. Objective experiments demonstrate that we achieve a correlation coefficient of up to 0.49 between estimated and target F 0 contours with more than 84% U/V decision accuracy, although the results have large variations.
There are several problems associated with using existing electrolarynxes. For example, the loud volume of the device itself might disturb smooth interpersonal communication, and its generated speech is also unnatural. To improve the quality of speech communication using such a medical device, this paper proposes a novel speech communication aid system for total laryngectomies. This system detects articulated speech caused by a new sound source as an alternative to the existing electrolarynx through the soft tissues of the head with a nonaudible murmur (NAM) microphone attached to the surface of the skin [Nakajima et al, Proc. Interspeech 2005, pp. 293–296 (2005)]. The new sound source outputs signals of extremely low energy that cannot be heard by people near the speaker. Such body-transmitted artificial speech is converted to a more natural voice by statistical voice conversion [T. Toda and K. Shikano, Proc. Interspeech 2005, pp. 1957–1960 (2005)]. The performance of the proposed system is evaluated in terms of objective and subjective measures using body-transmitted artificial speech simulated by a non-disabled speaker. Experimental results show that body-transmitted artificial speech is consistently converted to a much more natural and intelligible voice. [Work supported by SCOPE-S.]
The physical characteristics of weak body-conducted vocal-tract resonance signals called non-audible murmur (NAM) and the acoustic characteristics of three sensors developed for detecting these signals have been investigated. NAM signals attenuate 50 dB at 1 kHz; this attenuation consists of 30-dB full-range attenuation due to air-to-body transmission loss and −10 dB/octave spectral decay due to a sound propagation loss within the body. These characteristics agree with the spectral characteristics of measured NAM signals. The sensors have a sensitivity of between −41 and −58 dB [V/Pa] at 1 kHz, and the mean signal-to-noise ratio of the detected signals was 15 dB. On the basis of these investigations, three types of silent-speech enhancement systems were developed: (1) simple, direct amplification of weak vocal-tract resonance signals using a wired urethane-elastomer NAM microphone, (2) simple, direct amplification using a wireless urethane-elastomer-duplex NAM microphone, and (3) transformation of the weak vocal-tract resonance signals sensed by a soft-silicone NAM microphone into whispered speech using statistical conversion. Field testing of the systems showed that they enable voice impaired people to communicate verbally using body-conducted vocal-tract resonance signals. Listening tests demonstrated that weak body-conducted vocal-tract resonance sounds can be transformed into intelligible whispered speech sounds. Using these systems, people with voice impairments can re-acquire speech communication with less effort. Keywords: non-audible murmur, body-conducted sound, voice conversion, talking aids, -2 - IntroductionWhile microphones have been used in many scientific fields to sense speech sounds, a recently developed device called a -non-audible murmur (NAM) microphone‖ is receiving increasing attention as a new means for picking up body-conducted speech (Heracleous et al., 2003, Nakajima et al., 2006, Toda et al., 2005b, Nakamura et al., 2006. Typically, the speech sound used is air-borne sound, which is small, fast vibration of the air. Vibration of the air column in the vocal tract vibrates the tract wall, and some of the sound energy generated passes through the tissues of the neck and chest. The body-conducted sound that travels through the neck tissue can be sensed using a sensor modified from a microphone. Actually, speech sounds propagate not only through the air and the bone but also through the body tissue, including the muscles. Nakajima, a pioneer in NAM development, found that murmured speech, which is usually unheard by people nearby, can be detected by using a sensor attached to the neck behind the ear (Nakajima et al., 2003a). This body-conducted weak murmur is called -non-audible murmur (NAM)‖, and the sensor is known as a NAM microphone.Nakajima originally detected NAM using a stethoscopic NAM microphone (Nakajima et al., 2003a(Nakajima et al., , 2003b, which is an electret condenser microphone (ECM) implanted into a standard medical-use stethoscope with the tubes removed. When this...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.