The perception of prosodic prominence in spontaneous speech is investigated through an online task of prosody transcription using untrained listeners. Prominence is indexed through a probabilistic prominence score assigned to each word based on the proportion of transcribers who perceived the word as prominent. Correlation and regression analyses between perceived prominence, acoustic measures and measures of a word's information status are conducted to test three hypotheses: (i) prominence perception is signal-driven, influenced by acoustic factors reflecting speakers' productions; (ii) perception is expectation-driven, influenced by the listener's prior experience of word frequency and repetition; (iii) any observed influence of word frequency on perceived prominence is mediated through the acoustic signal. Results show correlates of perceived prominence in acoustic measures, in word log-frequency and in the repetition index of a word, consistent with both signal-driven and expectation-driven hypotheses of prominence perception. But the acoustic correlates of perceived prominence differ somewhat from the correlates of word frequency, suggesting an independent effect of frequency on prominence perception. A speech processing account is offered as a model of signal-driven and expectation-driven effects on prominence perception, where prominence ratings are a function of the ease of lexical processing, as measured through the activation levels of lexical and sub-lexical units.
Coarticulation is a source of acoustic variability for vowels, but how large is this effect relative to other sources of variance? We investigate acoustic effects of anticipatory V-to-V coarticulation relative to variation due to the following C and individual speaker. We examine F1 and F2 from V1 in 48 V1-C#V2 contexts produced by 10 speakers of American English. ANOVA reveals significant effects of both V2 and C on F1 and F2 measures of V1. The influence of V2 and C on acoustic variability relative to that of speaker and target vowel identity is evaluated using hierarchical linear regression. Speaker and target vowel account for roughly 80% of the total variance in F1 and F2, but when this variance is partialed out C and V2 account for another 18% (F1) and 63% (F2) of the remaining target vowel variability. Multinomial logistic regression (MLR) models are constructed to test the power of target vowel F1 and F2 for predicting C and V2 of the upcoming context. Prediction accuracy is 58% for C-Place, 76% for C-Voicing and 54% for V2, but only when variance due to other sources is factored out. MLR is discussed as a model of the parsing mechanism in speech perception.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.