A deep learning Phonet model was evaluated as a method to measure lenition. Unlike quantitative acoustic methods, recurrent networks were trained to recognize the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish. When applied to intervocalic and post-nasal voiced and voiceless stops, the approach yielded lenition patterns similar to those previously reported. Further, additional patterns also emerged. The results suggest the validity of the approach as an alternative or addition to quantitative acoustic measures of lenition.
This study compared a new approach of lenition measure to traditional acoustic-based methods. In this new approach, degrees of lenition are estimated from posterior probabilities generated by recurrent neural networks trained to recognize the sonorant and continuant phonological features. These two phonological features capture the range of surface manifestations, from a fricative to an approximant, of lenited voiced and voiceless stops in Spanish. Input to the networks is Mel-filtered log-energy computed from 25-ms windowed frames of each 0.5sec chunk of the input signals. When applied to lenition of intervocalic voiced and voiceless stops, /p, t, k, b, d, g/, in the corpus of Argentinian Spanish built by Google, the new approach yielded lenition patterns largely similar to those obtained using a quantitative acoustic method. Specifically, both approaches revealed that voiced stops were more lenited than voiceless stops, that lenition was more likely in unstressed syllables relative to stressed syllables and that degrees of lenition vary with place of articulation of the target phoneme and the height of surrounding vowels. However, a greater amount of variance was accounted for by the absolute and relative (to neighbouring segment) intensity measures of lenition in the acoustic method than the phonological posteriors.
Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete closure of the vocal tract can result in deaffrication of the English affricate sounds /tʃ/ and /ʤ/, spirantization (fricative-like production) of the stop consonants and palatalization (retraction of place of articulation) of the alveolar fricative /s/ (produced as /ʃ/). Such categorical segmental errors have been well-reported. This study employs a phonologically-informed neural network approach to estimate degrees of deaffrication of /tʃ/ and /ʤ/, spirantizationof /t/ and /d/ and place retraction for /s/ in a corpus of intoxicated English speech. Recurrent neural networks were trained to recognize relevant phonological features [anterior], [continuant] and [strident] in a control speech corpus. Their posteriorprobabilities were computed over the segments produced under intoxication. The results obtained revealed both categorical and gradient errors and, thus, suggested that this new approach could reliably quantify fine-grained errors in intoxicated speech.
Intoxication has a well-known effect on speech production. Lester and Skousen (1974) reported that the place of articulation for /s/ is retracted and /tʃ/ and /ʤ/ are deaffricated (i.e., substituted by a non-affricate segment) in drunken speech. Zihlmann (2017) further established the robustness of deaffrication as it cannot be consciously suppressed under intoxication. Using these prevalent speech errors as test cases, this study extends a phonologically-informed neural network approach to the study of intoxicated speech. The approach has success in measuring pathological speech and lenition patterns in healthy speakers. Degrees of place retraction for /s/ and deaffrication of /tʃ/ and /ʤ/ are estimated from posterior probabilities calculated by recurrent neural networks trained to recognize [anterior], [continuant] and [strident] features. When applied to a corpus of alcohol English speech, preliminary results suggested that sober versus drunken state could be reliably predicted by the three posterior probabilities. The directions of the effects are largely in line with previous studies. For example, /tʃ/ and /ʤ/ are more fricated (higher strident and continuant probabilities), and /s/ is more retracted (lower anterior probability) in drunken compared to sober speech. The results suggest that the intoxicated speech can be reliably quantified by this new approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.