2019
DOI: 10.3389/fnins.2019.00060
|View full text |Cite
|
Sign up to set email alerts
|

Keyword Spotting Using Human Electrocorticographic Recordings

Abstract: Neural keyword spotting could form the basis of a speech brain-computer-interface for menu-navigation if it can be done with low latency and high specificity comparable to the “wake-word” functionality of modern voice-activated AI assistant technologies. This study investigated neural keyword spotting using motor representations of speech via invasively-recorded electrocorticographic signals as a proof-of-concept. Neural matched filters were created from monosyllabic consonant-vowel utterances: one keyword utt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
12
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 24 publications
(39 reference statements)
2
12
0
Order By: Relevance
“…Electrocorticographic (ECoG) signals recorded from the cortical surface are well-suited for this purpose due to the broad coverage of multiple cortical areas (Herff and Schultz, 2016). Using ECoG, laryngeal activity (Dichter et al, 2018), phonetic features (Mesgarani et al, 2014; Lotte et al, 2015), articulatory gestures (Chartier et al, 2018; Mugler et al, 2018), phonemes (Mugler et al, 2014; Ramsey et al, 2017), words (Kellis et al, 2010; Milsap et al, 2019), and continuous sentences (Herff et al, 2015; Moses et al, 2016, 2018) have been investigated. To provide speech-impaired patients with the full expressive power of speech, it is crucial to include acoustic, prosodic, and linguistic cues.…”
Section: Introductionmentioning
confidence: 99%
“…Electrocorticographic (ECoG) signals recorded from the cortical surface are well-suited for this purpose due to the broad coverage of multiple cortical areas (Herff and Schultz, 2016). Using ECoG, laryngeal activity (Dichter et al, 2018), phonetic features (Mesgarani et al, 2014; Lotte et al, 2015), articulatory gestures (Chartier et al, 2018; Mugler et al, 2018), phonemes (Mugler et al, 2014; Ramsey et al, 2017), words (Kellis et al, 2010; Milsap et al, 2019), and continuous sentences (Herff et al, 2015; Moses et al, 2016, 2018) have been investigated. To provide speech-impaired patients with the full expressive power of speech, it is crucial to include acoustic, prosodic, and linguistic cues.…”
Section: Introductionmentioning
confidence: 99%
“…Both demonstrated that decoded activity can be used to resynthesize speech with high correlation to the original speech signal, with the former showing that vowels can be decoded with higher accuracy using anatomical information than can be done using traditional methods [88][89][90]. However, the most successful demonstrations to date are arguably by Akbari et al [91] and Milsap et al [92]. Akbari et al [91] demonstrated that highly intelligible speech can be resynthesized from the A1; the best performance was achieved using deep learning, both low-and high-frequency information and a vocoder target, which provided 75% intelligibility and 65% relative improvement over the linear-classifier, spectrogram-target baseline.…”
Section: Feasibility Of Speech Bcimentioning
confidence: 99%
“…Akbari et al [91] demonstrated that highly intelligible speech can be resynthesized from the A1; the best performance was achieved using deep learning, both low-and high-frequency information and a vocoder target, which provided 75% intelligibility and 65% relative improvement over the linear-classifier, spectrogram-target baseline. On the other hand, Milsap et al [92] demonstrated that 100% neural VAD of isolated syllables, with decoding performance likely limited only by electrode placement and density of coverage and less than~1s latency, is achievable through a simple matched-filter approach. Further, they observed that activity in the vSMC best discriminates place of articulation and consonant voicing, whereas activity in the STG best discriminates vowel height [92].…”
Section: Feasibility Of Speech Bcimentioning
confidence: 99%
See 2 more Smart Citations