2023
DOI: 10.1101/2023.06.26.546557
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep speech-to-text models capture the neural basis of spontaneous speech in everyday conversations

Ariel Goldstein,
Haocheng Wang,
Leonard Niekerken
et al.

Abstract: Humans effortlessly use the continuous acoustics of speech to communicate rich linguistic meaning during everyday conversations. In this study, we leverage 100 hours (half a million words) of spontaneous open-ended conversations and concurrent high-quality neural activity recorded using electrocorticography (ECoG) to decipher the neural basis of real-world speech production and comprehension. Employing a deep multimodal speech-to-text model named Whisper, we develop encoding models capable of accurately predic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 56 publications
0
4
0
Order By: Relevance
“…Many recent studies have begun to employ encoding models to predict neural responses during natural language processing using contextual embeddings derived from LLMs (Schrimpf et al, 2021;Caucheteux & King, 2022;Goldstein et al, 2022;Toneva et al, 2022;Cai et al, 2023;Goldstein, Wang, et al, 2023;Mischler et al, 2024;Zada et al, 2023). Our study demonstrates that aligning the neural activity in each brain into a shared, stimulus-driven feature space significantly enhances encoding performance.…”
Section: Discussionmentioning
confidence: 70%
See 2 more Smart Citations
“…Many recent studies have begun to employ encoding models to predict neural responses during natural language processing using contextual embeddings derived from LLMs (Schrimpf et al, 2021;Caucheteux & King, 2022;Goldstein et al, 2022;Toneva et al, 2022;Cai et al, 2023;Goldstein, Wang, et al, 2023;Mischler et al, 2024;Zada et al, 2023). Our study demonstrates that aligning the neural activity in each brain into a shared, stimulus-driven feature space significantly enhances encoding performance.…”
Section: Discussionmentioning
confidence: 70%
“…The vast majority of prior work fitting electrode-wise linguistic encoding models does not evaluate whether models generalize across individual subjects (Goldstein et al 2022;Goldstein, Wang, et al 2023;Mischler et al 2024;cf. Zada et al 2023).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Several previous studies have probed intracranial neural responses during language comprehension (e.g., Fedorenko et al, 2016; Nelson et al, 2017; Woolnough et al, 2023; Desbordes et al, 2023; Goldstein et al, 2022; 2023). For example, Fedorenko et al (2016) reported sensitivity in language-responsive electrodes to both word meanings and combinatorial processing, in line with fMRI findings (e.g., Fedorenko et al, 2010; Bedny et al, 2011).…”
Section: Introductionmentioning
confidence: 99%