Syllable Inference as a Mechanism for Spoken Language Understanding

Brown, Marilyn R.; Tanenhaus, Michael K.; Dilley, Laura C.

doi:10.1111/tops.12529

Cited by 9 publications

(13 citation statements)

References 112 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Spoken language comprehension therefore relies on listeners going beyond the information given and inferring the presence of linguistic structure based on their knowledge of language. As such, many theories posit that linguistic structures—ranging from syllables to morphemes to “words” to syntactic structures—are constructed via an endogenous inference process [ 4 – 19 ]. On this view, also known as “analysis by synthesis” [ 20 ], speech triggers internal generation of memory representations (synthesis), which are compared to the sensory input (analysis).…”

Section: Introductionmentioning

confidence: 99%

Neural dynamics differentially encode phrases and sentences during spoken language comprehension

2022

View full text Add to dashboard Cite

Human language stands out in the natural world as a biological signal that uses a structured system to combine the meanings of small linguistic units (e.g., words) into larger constituents (e.g., phrases and sentences). However, the physical dynamics of speech (or sign) do not stand in a one-to-one relationship with the meanings listeners perceive. Instead, listeners infer meaning based on their knowledge of the language. The neural readouts of the perceptual and cognitive processes underlying these inferences are still poorly understood. In the present study, we used scalp electroencephalography (EEG) to compare the neural response to phrases (e.g., the red vase) and sentences (e.g., the vase is red), which were close in semantic meaning and had been synthesized to be physically indistinguishable. Differences in structure were well captured in the reorganization of neural phase responses in delta (approximately <2 Hz) and theta bands (approximately 2 to 7 Hz),and in power and power connectivity changes in the alpha band (approximately 7.5 to 13.5 Hz). Consistent with predictions from a computational model, sentences showed more power, more power connectivity, and more phase synchronization than phrases did. Theta–gamma phase–amplitude coupling occurred, but did not differ between the syntactic structures. Spectral–temporal response function (STRF) modeling revealed different encoding states for phrases and sentences, over and above the acoustically driven neural response. Our findings provide a comprehensive description of how the brain encodes and separates linguistic structures in the dynamics of neural responses. They imply that phase synchronization and strength of connectivity are readouts for the constituent structure of language. The results provide a novel basis for future neurophysiological research on linguistic structure representation in the brain, and, together with our simulations, support time-based binding as a mechanism of structure encoding in neural dynamics.

show abstract

Section: Introductionmentioning

confidence: 99%

Neural dynamics differentially encode phrases and sentences during spoken language comprehension

2022

View full text Add to dashboard Cite

show abstract

“…This establishes that theories relying on particular acoustic properties being found in speech signals cannot account for human word recognition. Instead, listeners use multiple cues, on multiple levels of linguistic abstraction, to deduce the locations of word boundaries [18,[37][38][39][40][41][42][43][44], supporting proposals that view word segmentation as probabilistic inference [33,[45][46][47].…”

Section: Introductionmentioning

confidence: 72%

Adaptive pacing in word segmentation and the Vowel-onset Paced Syllable Inference model

Pittman-Polletta,

Dilley

2023

Preprint

View full text Add to dashboard Cite

In speech perception, timing and content are interdependent. For example, in distal rate effects, context speech rate determines the number of words, syllables, and phonemes heard in an unchanging target speech segment. Such results confront psycholinguistic theory with the chicken-and-egg problem of concurrently inferring speech timing and content, and the interrelated issues of narrowing the search space of speech interpretations without bias and optimizing the speed/accuracy tradeoff in online processing. We propose listeners address these issues by managing the timing of speech-related computations. Specifically, we claim: (1) listeners model speech timing as part of a speaker model; (2) variable-length sequences of morphosyntactic units are the basic increments of speech inference; and (3) listeners adaptively schedule inferential updates and computationally intensive operations according to (4) fluctuations in uncertainty predicted by the speaker model. We illustrate these claims in a mechanistic model—Vowel-onset Paced Syllable Inference—explaining multiple psycholinguistic results, including distal rate effects.

show abstract

“…Consistent with these findings, later work using eye-tracking methodology has also revealed that listeners can use information from preceding rhythmic patterns to predict upcoming lexical stress (e.g., “ jury ” versus “ giraffe ,” Brown et al, 2011, 2015), and studies using the event-related potential (ERP) paradigm show that preceding cues can support prediction of word boundaries and later lexical processing and interpretations of what was heard (Breen et al, 2014). Further, recent research has also shown that speech rate can also facilitate prediction of upcoming weak syllables (Baese-Berk et al, 2019; see also, Brown et al, 2021), suggesting that preceding prosodic cues can have a pervasive role in predicting upcoming words.…”

Section: Variation Flexibility and Cue Weightingmentioning

confidence: 99%

In Search of Salience: Focus Detection in the Speech of Different Talkers

Cutler

2021

Lang Speech

View full text Add to dashboard Cite

Many different prosodic cues can help listeners predict upcoming speech. However, no research to date has assessed listeners’ processing of preceding prosody from different speakers. The present experiments examine (1) whether individual speakers (of the same language variety) are likely to vary in their production of preceding prosody; (2) to the extent that there is talker variability, whether listeners are flexible enough to use any prosodic cues signaled by the individual speaker; and (3) whether types of prosodic cues (e.g., F0 versus duration) vary in informativeness. Using a phoneme-detection task, we examined whether listeners can entrain to different combinations of preceding prosodic cues to predict where focus will fall in an utterance. We used unsynthesized sentences recorded by four female native speakers of Australian English who happened to have used different preceding cues to produce sentences with prosodic focus: a combination of pre-focus overall duration cues, F0 and intensity (mean, maximum, range), and longer pre-target interval before the focused word onset (Speaker 1), only mean F0 cues, mean and maximum intensity, and longer pre-target interval (Speaker 2), only pre-target interval duration (Speaker 3), and only pre-focus overall duration and maximum intensity (Speaker 4). Listeners could entrain to almost every speaker’s cues (the exception being Speaker 4’s use of only pre-focus overall duration and maximum intensity), and could use whatever cues were available even when one of the cue sources was rendered uninformative. Our findings demonstrate both speaker variability and listener flexibility in the processing of prosodic focus.

show abstract

Syllable Inference as a Mechanism for Spoken Language Understanding

Cited by 9 publications

References 112 publications

Neural dynamics differentially encode phrases and sentences during spoken language comprehension

Neural dynamics differentially encode phrases and sentences during spoken language comprehension

Adaptive pacing in word segmentation and the Vowel-onset Paced Syllable Inference model

In Search of Salience: Focus Detection in the Speech of Different Talkers

Contact Info

Product

Resources

About