The temporal lobe in the left hemisphere has long been implicated in the perception of speech sounds. Little is known, however, regarding the specific function of different temporal regions in the analysis of the speech signal. Here we show that an area extending along the left middle and anterior superior temporal sulcus (STS) is more responsive to familiar consonant-vowel syllables during an auditory discrimination task than to comparably complex auditory patterns that cannot be associated with learned phonemic categories. In contrast, areas in the dorsal superior temporal gyrus bilaterally, closer to primary auditory cortex, are activated to the same extent by the phonemic and nonphonemic sounds. Thus, the left middle/anterior STS appears to play a role in phonemic perception. It may represent an intermediate stage of processing in a functional pathway linking areas in the bilateral dorsal superior temporal gyrus, presumably involved in the analysis of physical features of speech and other complex non-speech sounds, to areas in the left anterior STS and middle temporal gyrus that are engaged in higher-level linguistic processes.
Purpose In this study, the authors examined whether rhythm metrics capable of distinguishing languages with high and low temporal stress contrast also can distinguish among control and dysarthric speakers of American English with perceptually distinct rhythm patterns. Methods Acoustic measures of vocalic and consonantal segment durations were obtained for speech samples from 55 speakers across 5 groups (hypokinetic, hyperkinetic, flaccid-spastic, ataxic dysarthrias, and controls). Segment durations were used to calculate standard and new rhythm metrics. Discriminant function analyses (DFAs) were used to determine which sets of predictor variables (rhythm metrics) best discriminated between groups (control vs. dysarthrias; and among the 4 dysarthrias). A cross-validation method was used to test the robustness of each original DFA. Results The majority of classification functions were more than 80% successful in classifying speakers into their appropriate group. New metrics that combined successive vocalic and consonantal segments emerged as important predictor variables. DFAs pitting each dysarthria group against the combined others resulted in unique constellations of predictor variables that yielded high levels of classification accuracy. Conclusions: This study confirms the ability of rhythm metrics to distinguish control speech from dysarthrias and to discriminate dysarthria subtypes. Rhythm metrics show promise for use as a rational and objective clinical tool.
This study is the third in a series that has explored the source of intelligibility decrement in dysarthria by jointly considering signal characteristics and the cognitive-perceptual processes employed by listeners. A paradigm of lexical boundary error analysis was used to examine this interface by manipulating listener constraints with a brief familiarization procedure. If familiarization allows listeners to extract relevant segmental and suprasegmental information from dysarthric speech, they should obtain higher intelligibility scores than nonfamiliarized listeners, and their lexical boundary error patterns should approximate those obtained in misperceptions of normal speech. Listeners transcribed phrases produced by speakers with either hypokinetic or ataxic dysarthria after being familiarized with other phrases produced by these speakers. Data were compared to those of nonfamiliarized listeners [Liss et al., J. Acoust. Soc. Am. 107, 3415-3424 (2000)]. The familiarized groups obtained higher intelligibility scores than nonfamiliarized groups, and the effects were greater when the dysarthria type of the familiarization procedure matched the dysarthria type of the transcription task. Remarkably, no differences in lexical boundary error patterns were discovered between the familiarized and nonfamiliarized groups. Transcribers of the ataxic speech appeared to have difficulty distinguishing strong and weak syllables in spite of the familiarization. Results suggest that intelligibility decrements arise from the perceptual challenges posed by the degraded segmental and suprasegmental aspects of the signal, but that this type of familiarization process may differentially facilitate mapping segmental information onto existing phonological categories.
This investigation evaluated a possible source of reduced intelligibility in hypokinetic dysarthric speech, namely the mismatch between listeners' perceptual strategies and the acoustic information available in the dysarthric speech signal. A paradigm of error analysis was adopted in which listener transcriptions of phrases were coded for the presence and type of word boundary errors. Seventy listeners heard 60 phrases produced by speakers with hypokinetic dysarthria. The six-syllable phrases alternated strong and weak syllables and ranged in length from three to five words. Lexical boundary violations were defined as erroneous insertions or deletions of lexical boundaries that occurred either before strong or before weak syllables. A total of 1596 lexical boundary errors in the listeners' transcriptions was identified unanimously by three independent judges. The pattern of errors generally conformed with the predictions of the Metrical Segmentation Strategy hypothesis [Cutler and Norris, J. Exp. Psychol. 14, 113-121 (1988)] which posits that listeners attend to strong syllables to identify word onsets. However, the strength of adherence to this pattern varied across speakers. Comparison of acoustic evidence of syllabic strength to lexical boundary error patterns revealed a source of intelligibility deficit associated with this particular type of dysarthric speech pattern.
It has been posited that the role of prosody in lexical segmentation is elevated when the speech signal is degraded or unreliable. Using predictions from Cutler and Norris' [J. Exp. Psychol. Hum. Percept. Perform. 14, 113-121 (1988)] metrical segmentation strategy hypothesis as a framework, this investigation examined how individual suprasegmental and segmental cues to syllabic stress contribute differentially to the recognition of strong and weak syllables for the purpose of lexical segmentation. Syllabic contrastivity was reduced in resynthesized phrases by systematically (i) flattening the fundamental frequency (F0) contours, (ii) equalizing vowel durations, (iii) weakening strong vowels, (iv) combining the two suprasegmental cues, i.e., F0 and duration, and (v) combining the manipulation of all cues. Results indicated that, despite similar decrements in overall intelligibility, F0 flattening and the weakening of strong vowels had a greater impact on lexical segmentation than did equalizing vowel duration. Both combined-cue conditions resulted in greater decrements in intelligibility, but with no additional negative impact on lexical segmentation. The results support the notion of F0 variation and vowel quality as primary conduits for stress-based segmentation and suggest that the effectiveness of stress-based segmentation with degraded speech must be investigated relative to the suprasegmental and segmental impoverishments occasioned by each particular degradation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.