SpeeChin

Zhang, Ruidong; Chen, Mingyang; Steeper, Benjamin; Li, Yaxuan; Yan, Zihan; Chen, Yizhuo; Tao, Songyun; Chen, Tuochao; Lim, Hyunchul; Zhang, Cheng

doi:10.1145/3494987

Cited by 42 publications

(5 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lipreading is a technology that utilizes a camera to visually capture movement around the mouth and interpret speech from the image sequence. HCI researchers have proposed to use devices such as smartphones [45,58] and wearable cameras [6,34,69] to provide mobile silent speech interaction, as well as multimodal approaches such as using silent speech to facilitate eye-gaze-based selection [57].…”

Section: Silent Speech Interfacementioning

confidence: 99%

“…Additionally, there is a lack of a practical activating method to initiate silent speech input. Previous methods such as offline segmentation [6,34,69] or trigger buttons [45,52] are not feasible for hands-free real-time interactions, and MOD-based methods can be vulnerable to misactivations [57,58]. We propose a novel few-shot transfer learning paradigm to enable customizable silent speech commands.…”

Section: Machine Learning Approaches To Lipreading Interfacesmentioning

confidence: 99%

“…Detecting and segmenting the user's silent speech has been challenging in real-time lipreading. Previous researchers have proposed to activate the recognition algorithm by using the opening degree of the mouth to identify silent speech [57,58,69]. However, this approach is prone to misactivation because it can be easily confused when the user is talking to others or unintentionally opens their mouth.…”

Section: Visual Abstractpottingmentioning

confidence: 99%

“…First, the data collection process should minimize the effort for new users to get started with. However, previous approaches to SSIs, not limited to lipreading-based approaches, adopt a train-from-scratch model that requires collecting hundreds of samples from real users [34,57,58,69], leading to excessive mental and physical user burden. Second, such data collected intensively in controlled laboratory environments causes a biased model, which can be sensitive to even minor changes in factors such as lighting, face orientations, and postures, yet there is little discussion on the model's ability to generalize to unseen environments.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Su,

Fang,

Rekimoto

2023

Preprint

View full text Add to dashboard Cite

Figure 1: Example interaction of LipLearner. A) Voice2Lip in-situ command registration. The user records a silent speech command by vocalizing it once, then LipLearner automatically learns to lip-read it with the text recognized from the voice signal as the label. B) The command then can be used without vocalization, triggered by a silent keyword. LipLearner enables silent speech recognition which can be used in public settings (e.g., on the subway). Furthermore, it leverages incremental learning to proactively extend the model's knowledge when new samples become available.

show abstract

Section: Silent Speech Interfacementioning

confidence: 99%

Section: Machine Learning Approaches To Lipreading Interfacesmentioning

confidence: 99%

Section: Visual Abstractpottingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Su,

Fang,

Rekimoto

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, a wide variety of indirect secondary carriers of information about voice commands have been explored. Many of these techniques achieve high accuracy at the price of being highly invasive because they rely on placing sensors (e.g., magnetic, [ 2 ] surface electromyographic, [ 3 ] infrared, [ 4 ] electropalatographic, [ 5 ] electromagnetic [ 6 , 7 ] ) directly on the human's body to detect subtle vibrations that are correlated with the speech production. Obviously, such contact‐based approaches are oftentimes inconvenient and, moreover, incompatible with large‐scale deployment in our daily lives.…”

Section: Introductionmentioning

confidence: 99%

Microwave Speech Recognizer Empowered by a Programmable Metasurface

Zhang,

Ruan,

Zhao

et al. 2024

Advanced Science

View full text Add to dashboard Cite

Speech recognition becomes increasingly important in the modern society, especially for human–machine interactions, but its deployment is still severely thwarted by the struggle of machines to recognize voiced commands in challenging real‐life settings: oftentimes, ambient noise drowns the acoustic sound signals, and walls, face masks or other obstacles hide the mouth motion from optical sensors. To address these formidable challenges, an experimental prototype of a microwave speech recognizer empowered by programmable metasurface is presented here that can remotely recognize human voice commands and speaker identities even in noisy environments and if the speaker's mouth is hidden behind a wall or face mask. The programmable metasurface is the pivotal hardware ingredient of the system because its large aperture and huge number of degrees of freedom allows the system to perform a complex sequence of sensing tasks, orchestrated by artificial‐intelligence tools. Relying solely on microwave data, the system avoids visual privacy infringements. The developed microwave speech recognizer can enable privacy‐respecting voice‐commanded human–machine interactions is experimentally demonstrated in many important but to‐date inaccessible application scenarios. The presented strategy will unlock new possibilities and have expectations for future smart homes, ambient‐assisted health monitoring, as well as intelligent surveillance and security.

show abstract

An Analysis on Augmentative and Assistive Technology for the Speech Disorder People

Ramkumar

Renuka

2023

2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

View full text Add to dashboard Cite

SpeeChin

Cited by 42 publications

References 51 publications

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Microwave Speech Recognizer Empowered by a Programmable Metasurface

An Analysis on Augmentative and Assistive Technology for the Speech Disorder People

Contact Info

Product

Resources

About