Segmentation Cues in Conversational Speech: Robust Semantics and Fragile Phonotactics

White, Laurence; Mattys, Sven L.; Wiget, Lukas

doi:10.3389/fpsyg.2012.00375

Cited by 9 publications

(6 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A similar word-onset electroencephalographic (EEG) response [19] emerged only after listeners learned to segment an artificial language into words [18], suggesting that it is not a response to local acoustic properties alone. A response tightly locked to word onset suggests that whichever cues listeners use to detect word boundaries [46–48], boundaries seem to be generally detected as they occur, rather than after incorporating cues occurring subsequent to word onset.…”

Section: Resultsmentioning

confidence: 99%

Transformation from auditory to linguistic representations across auditory cortex is rapid and attention dependent for continuous speech

Brodbeck

Simon

2018

Preprint

View full text Add to dashboard Cite

SummaryDuring speech perception, a central task of the auditory cortex is to analyze complex acoustic patterns to allow detection of the words that encode a linguistic message. It is generally thought that this process includes at least one intermediate, phonetic, level of representations [1–6], localized bilaterally in the superior temporal lobe [7–10]. Phonetic representations reflect a transition from acoustic to linguistic information, classifying acoustic patterns into linguistically meaningful units, which can serve as input to mechanisms that access abstract word representations [11–13]. While recent research has identified neural signals arising from successful recognition of individual words in continuous speech [14–17], no explicit neurophysiological signal has been found demonstrating the transition from acoustic/phonetic to symbolic, lexical representations. Here we report a response reflecting the incremental integration of phonetic information for word identification, dominantly localized to the left temporal lobe. The short response latency, approximately 110 ms relative to phoneme onset, suggests that phonetic information is used for lexical processing as soon as it becomes available. Responses also tracked word boundaries, confirming previous reports of immediate lexical segmentation [18,19]. These new results were further investigated using a cocktail-party paradigm [20,21] in which participants listened to a mix of two talkers, attending to one and ignoring the other. Analysis indicates neural lexical processing of only the attended, but not the unattended, speech stream. Thus, while responses to acoustic features reflect attention through selective amplification of attended speech, responses consistent with a lexical processing model reveal categorically selective processing.

show abstract

Section: Resultsmentioning

confidence: 99%

Transformation from auditory to linguistic representations across auditory cortex is rapid and attention dependent for continuous speech

Brodbeck

Simon

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Lexical cues refer to higherlevel information arising from knowledge of individual words and syntactic, semantic, and pragmatic relations between words. Segmental cues include phonemic, phonotactic, and co-articulatory features, and suprasegmental cues refer to speech rhythm properties, including metrical stress (see White et al, 2012, for more detail of the segmentation cue categories).…”

Section: Introductionmentioning

confidence: 99%

A relationship between processing speech in noise and dysarthric speech

Borrie

Baese‐Berk

Engen

et al. 2017

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

There is substantial individual variability in understanding speech in adverse listening conditions. This study examined whether a relationship exists between processing speech in noise (environmental degradation) and dysarthric speech (source degradation), with regard to intelligibility performance and the use of metrical stress to segment the degraded speech signals. Ninety native speakers of American English transcribed speech in noise and dysarthric speech. For each type of listening adversity, transcriptions were analyzed for proportion of words correct and lexical segmentation errors indicative of stress cue utilization. Consistent with the hypotheses, intelligibility performance for speech in noise was correlated with intelligibility performance for dysarthric speech, suggesting similar cognitive-perceptual processing mechanisms may support both. The segmentation results also support this postulation. While stress-based segmentation was stronger for speech in noise relative to dysarthric speech, listeners utilized metrical stress to parse both types of listening adversity. In addition, reliance on stress cues for parsing speech in noise was correlated with reliance on stress cues for parsing dysarthric speech. Taken together, the findings demonstrate a preference to deploy the same cognitive-perceptual strategy in conditions where metrical stress offers a route to segmenting degraded speech.

show abstract

“…Question-answer sequences were extracted from corpora of spontaneous conversations in Dutch (Van Son et al 2008) and English (White et al 2012). As described below, the corpora were both comprised of free conversations between people who were either friends or colleagues in university environments.…”

Section: Corporamentioning

confidence: 99%

“…The English dataset was sampled from a corpus of eight dyadic lab-based conversations (White et al 2012, which also includes read sentences and map task speech, not analysed here). Each conversational recording was between 16 and 22 min long.…”

Section: Corporamentioning

confidence: 99%

Speech Rate and Turn-Transition Pause Duration in Dutch and English Spontaneous Question-Answer Sequences

2023

Self Cite

View full text Add to dashboard Cite

The duration of inter-speaker pauses is a pragmatically salient aspect of conversation that is affected by linguistic and non-linguistic context. Theories of conversational turn-taking imply that, due to listener entrainment to the flow of syllables, a higher speech rate will be associated with shorter turn-transition times (TTT). Previous studies have found conflicting evidence, however, some of which may be due to methodological differences. In order to test the relationship between speech rate and TTT, and how this may be modulated by other dialogue factors, we used question-answer sequences from spontaneous conversational corpora in Dutch and English. As utterance-final lengthening is a local cue to turn endings, we also examined the impact of utterance-final syllable rhyme duration on TTT. Using mixed-effect linear regression models, we observed evidence for a positive relationship between speech rate and TTT: thus, a higher speech rate is associated with longer TTT, contrary to most theoretical predictions. Moreover, for answers following a pause (“gaps”) there was a marginal interaction between speech rate and final rhyme duration, such that relatively long final rhymes are associated with shorter TTT when foregoing speech rate is high. We also found evidence that polar (yes/no) questions are responded to with shorter TTT than open questions, and that direct answers have shorter TTT than responses that do not directly answer the questions. Moreover, the effect of speech rate on TTT was modulated by question type. We found no predictors of the (negative) TTT for answers that overlap with the foregoing questions. Overall, these observations suggest that TTT is governed by multiple dialogue factors, potentially including the salience of utterance-final timing cues. Contrary to some theoretical accounts, there is no strong evidence that higher speech rates are consistently associated with shorter TTT.

show abstract

Segmentation Cues in Conversational Speech: Robust Semantics and Fragile Phonotactics

Cited by 9 publications

References 28 publications

Transformation from auditory to linguistic representations across auditory cortex is rapid and attention dependent for continuous speech

Transformation from auditory to linguistic representations across auditory cortex is rapid and attention dependent for continuous speech

A relationship between processing speech in noise and dysarthric speech

Speech Rate and Turn-Transition Pause Duration in Dutch and English Spontaneous Question-Answer Sequences

Contact Info

Product

Resources

About