Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips

Hueber, Thomas; Benaroya, Elie-Laurent; Chollet, G.; Denby, B.; Dreyfus, Gérard; Stone, Maureen

doi:10.1016/j.specom.2009.11.004

Cited by 161 publications

(117 citation statements)

References 22 publications

Supporting

Mentioning

114

Contrasting

Unclassified

Order By: Relevance

“…무음 성 전달 방법으로서, 입주변에서 발생하는 근전도 신호를 이용하는 방법, [2] NAM(Non-Audible Microphone) 을 입주변에 부착하여 음성을 취득하는 방법, [3] 자석 과 자계센서를 이용하는 방법, [4] 구강 및 비강의 초음 파 영상을 이용한 방법, [5] GHz microwave를 이용하는 방법, [6] 초음파 신호를 이용하는 방법 [7] 등을 들 수 있다. Kalgaonkar et al [8] 의 연구에서는 간단한 손동작을 초음파 도플러를 이용하여 인식하였을 때 평균 88.4 % 의 인식율 [10] 이, 보행 패턴을 인식하는 경우 91.7 %의 인식율을 얻는 것으로 보고하였다.…”

Section: 초음파 도플러를 이용한 음성 인식unclassified

Automatic speech recognition using acoustic doppler signal

Lee¹

2016

The Journal of the Acoustical Society of Korea

View full text Add to dashboard Cite

In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.

show abstract

Section: 초음파 도플러를 이용한 음성 인식unclassified

Automatic speech recognition using acoustic doppler signal

Lee¹

2016

The Journal of the Acoustical Society of Korea

View full text Add to dashboard Cite

show abstract

“…Many different SSIs have been proposed so far, mainly differing in the type of biosignal they rely on. Thus, we can find SSIs that exploit the electrical signals generated by the neurons in the brain [23] or in the articulator muscles [31,42,49] or the movement of the speech articulators themselves [40,44,9,29,18,14,26,21]. In our work we use a magnetic sensing technique known as Permanent Magnet Articulography (PMA) [13,18] for capturing the movement of the speech articulators.…”

Section: Introductionmentioning

confidence: 99%

Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion

González

Cheah

Gilbert

et al. 2017

Biomedical Engineering Systems and Technologies

View full text Add to dashboard Cite

ReuseUnless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version -refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher's website. TakedownIf you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. Abstract. In this work, we present a silent speech system that is able to generate audible speech from captured movement of speech articulators. Our goal is to help laryngectomy patients, i.e. patients who have lost the ability to speak following surgical removal of the larynx most frequently due to cancer, to recover their voice. In our system, we use a magnetic sensing technique known as Permanent Magnet Articulography (PMA) to capture the movement of the lips and tongue by attaching small magnets to the articulators and monitoring the magnetic field changes with sensors close to the mouth. The captured sensor data is then transformed into a sequence of speech parameter vectors from which a time-domain speech signal is finally synthesised. The key component of our system is a parametric transformation which represents the PMA-to-speech mapping. Here, this transformation takes the form of a statistical model (a mixture of factor analysers, more specifically) whose parameters are learned from simultaneous recordings of PMA and speech signals acquired before laryngectomy. To evaluate the performance of our system on voice reconstruction, we recorded two PMA-and-speech databases with different phonetic complexity for several non-impaired subjects. Results show that our system is able to synthesise speech that sounds as the original voice of the subject and also is intelligible. However, more work still need to be done to achieve a consistent synthesis for phonetically-rich vocabularies.

show abstract

“…Although still in developmental stages (e.g., speakerdependent recognition, small-vocabulary), SSIs even have potential to provide speech output based on prerecorded samples of the patient's own voice Green et al, 2011;Wang et al, 2009). Potential articulatory data acquisition methods for SSIs include ultrasound (Denby et al, 2011;Hueber et al, 2010), surface electromyography electrodes (Heaton et al, 2011;Jorgensen and Dusan, 2010), and electromagnetic articulograph (EMA) (Fagan et al, 2008;Wang et al, 2009Wang et al, , 2012a.…”

Section: Introductionmentioning

confidence: 99%

“…So far, most of the published work on SSIs has focused on development of silent speech recognition algorithm through offline analysis (i.e., using prerecorded data) (Fagan et al, 2008;Heaton et al, 2011;Hofe et al, 2013;Hueber et al, 2010;Jorgenson et al, 2010;Wang et al, 2009aWang et al, , 2012aWang et al, , 2012bWang et al, , 2013c. Ultrasoundbased SSIs have been tested online with multiple subjects and encouraging results were obtained in a phrase reading task where the subjects were asked to silently articulate sixty phrases (Denby et al, 2011).…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 5th Workshop on Speech and Language Processing for Assistive Technologies

2014

View full text Add to dashboard Cite

Augmentative Alternative Communication (AAC) policy suffers from a lack of large scale quantitative evidence on the demographics of users and diversity of devices.The 2013 Domesday Dataset was created to aid formation of AAC policy at the national level. The dataset records purchases of AAC technology by the UK's National Health Service between 2006 and 2012; giving information for each item on: make, model, price, year of purchase, and geographic area of purchase. The dataset was designed to help answer open questions about the provision of AAC services in the UK; and the level of detail of the dataset is such that it can be used at the research level to provide context for researchers and to help validate (or not) assumptions about everyday AAC use. This paper examine three different ways of using the Domesday Dataset to provide verified evidence to support, or refute, assumptions, uncover important research problems, and to properly map the technological distinctiveness of a user community. IntroductionTechnical researchers in the AAC community are required to make certain assumptions about the state of the community when choosing research projects that are calculated to make the most effective use of research resources for the greatest possible benefit.A particular issue is estimating how easily technical research can achieve wide scale adoption or commercial impact. For example, (Szekely et al., 2012) uses a webcam and facial analysis to allow a user to control expressive features of their synthetic speech by means of facial expressions. Such work is clearly useful, but it is difficult to assess its potential commercial impact without also knowing what proportion of currently available AAC devices include webcams and how that proportion is changing over time. Similarly, corpus based approaches such as (Mitchell and Sproat, 2012) could potentially be brought to market very quickly, but that potential can only be assessed if we also have some awareness of the range and popularity of AAC devices that either have space for such a corpus or the internet capability to access one. Unfortunately, even though there are a range of AAC focused meta-studies in the literature (see, for example, (Pennington et al., 2003; Pennington et al., 2004; Hanson et al., 2004; Alwell and Cobb, 2009)) they give little information on the technical landscape of AAC. This paper examines three issues of interest to technical researchers in AAC, each from a different stage in the research lifecycle. It then shows how the Domesday Dataset (Reddington, 2013) can provide evidence to support, or refute, assumptions, uncover important research problems, and map the technological distinctiveness of a user community.This paper is structured as follows, Section 2 introduces the Domesday Dataset and discusses the context it is used in in this work. Section 3 examines the issue that little is known about the prevalence of equipment within the AAC user community, and because of this lack of information it is difficult to establish ba...

show abstract

Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips

Cited by 161 publications

References 22 publications

Automatic speech recognition using acoustic doppler signal

Automatic speech recognition using acoustic doppler signal

Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion

Proceedings of the 5th Workshop on Speech and Language Processing for Assistive Technologies

Contact Info

Product

Resources

About