When annotating a speech signal using an autosegmental-metrical model of intonation, transcribers associate portions of the F 0 contour with labels from a finite inventory of tonal categories. In the models we are concerned with here, these categories have the status of phonological units (phonological form), bridging the intrinsic variability of the speech signal (substance) with the intrinsic fuzziness of post-lexical function (meaning). This, together with the relatively small size of the label inventory, precludes a one-to-one relationship between form and substance, and/or between form and function. A Neapolitan Italian corpus of read speech is used to investigate the distributional properties of two pitch accents that have been studied extensively with respect to substance (the alignment of F 0 peaks) and meaning (sentence modality). Although there is a general consensus that peaks in this variety are aligned earlier in declaratives than in interrogatives, evidence is provided of contexts in which the converse is true, i.e., in which interrogative peaks are even earlier than their declarative counterparts. In this respect, interrogatives have a richer internal structure than declaratives. We argue that differences in how variably a prosodic category is encoded can be dealt with in an intonation transcription system, as long as this system relates phonological form (the choice of pitch accent in this case) both to phonetic substance and to meaning in a transparent way. 1 This is true of any intonation transcription system, although priorities vary. For instance, within the British school, Crystal prioritizes substance and sees intonation as "the product of the interaction of features from different prosodic systems-tone, pitch-range, loudness, rhythmicality and tempo in particular" (Crystal, 1975, p. 283), whereas Halliday (1967) emphasizes meaning, in accordance with his understanding of intonation as a system within the grammar of English. For recent views on the importance of meaning in intonation transcription, see also Arvaniti (2016) and Cole and Shattuck-Hufnagel (2016).
In this work we propose the use of Functional Data Analysis (FDA) as a powerful methodology to tackle problems where multiple continuous speech parameters have to be analyzed jointly. A production study on contrastive focus placement in Neapolitan Italian is used as illustration. Two features are analyzed, viz. f0 and relative speech rate, both expressed as continuous functions of time. The results show that known facts about the prosody of Neapolitan Italian emerge from the data, but also other interesting local or crossfeature relationships between contour traits appear. Thus, FDA results can be used as guidance in the exploration of speech feature contour shapes, an operation that used to be carried out manually in previous speech research. The capability of jointly analyzing multiple continuous features provides a valuable improvement not only for speech analysis but also for speech re-synthesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.