Speech sound contrasts differ along multiple phonetic dimensions. During speech perception, listeners must decide which cues are relevant, and determine the relative importance of each cue, while also integrating other, signal-external cues. The comparison of cue weighting in perception and production bears on a range of theoretical issues including the processes underlying sound change, the time course of learning, the nature of cues, and the perception-production interface. Research examining the relative alignment of cue weighting across the modalities, on both a community and individual level, has revealed both parallels and asymmetries between the modalities. The extraordinarily wide range of ways that have been used to conceptualize and quantify cue weights reflects the inherent theoretical, methodological, and analytical differences between the two modalities. More consideration of the choices of analytical metrics, explicit discussion of the theoretical assumptions that underlie them, and systematic investigations of different types of cues will lead to more generalizable findings that can be incorporated into computational implementable models of speech processing.
This article is categorized under:Linguistics > Language in Mind and Braincue weighting, phonetics, speech perception, speech production
| INTRODUCTIONSpeech sound contrasts differ on multiple phonetic dimensions: for example, the English sounds /b/ and /p/ differ systematically in voice onset time (VOT, the amount of time between the stop release and onset of voicing), but also differ, albeit less reliably, in other dimensions including the duration of the stop closure and the fundamental frequency (f0, corresponding to perceived pitch) at the onset of voicing (Lisker, 1986). Phonetic cue weighting, or the relative use of these acoustic "cues," can be conceptualized and quantified in the context of both production and perception. For example, as shown in Figure 1, English speakers' productions of /b/ and /p/ show large and consistent differences in VOT, such that /b/ vs. /p/ category membership can by well-predicted using this cue alone. On the other hand, while /b/ is followed by a lower f0 than /p/ on average, this difference is much smaller and less consistent, such that f0 is only weakly predictive of category membership. This asymmetry between the two cues is reflected in perception: when asked to categorize sounds varying in aspiration duration and f0, listeners' responses are mainly determined by aspiration (the primary cue), with the value of f0 playing a secondary, albeit still detectable, role (Abramson & Lisker, 1985).