A new descriptive framework for voice quality perception (Kreiman, Gerratt, Kempster, Erman, & Berke, 1993) states that when listeners rate a voice on some quality dimension (e.g., roughness), they compare the stimulus presented to an internal standard or scale. Hypothetically, substituting explicit, external standards for these unstable internal standards should improve listener reliability. Further, the framework suggests that internal standards for vocal qualities are inherently unstable, and may be influenced by factors other than the physical signal being judged. Among these factors, context effects may cause drift in listeners’ voice ratings by influencing the internal standard against which judgments are made. To test these hypotheses, we asked 12 clinicians to judge the roughness of 22 synthetic stimuli using two scales: a traditional 5-point equal-appearing interval (EAI) scale and a scale with explicit anchor stimuli for each scale point. The stimulus set included a relatively large number of normal and mildly rough voices. We predicted that this would produce an increase in the perceived roughness of moderately rough stimuli over time for the EAI ratings, but not for the explicitly anchored ratings. Ratings made using the anchored scale were significantly more reliable than those gathered using the unanchored paradigm. Further, as predicted, ratings on the unanchored EAI scale drifted significantly within a listening session in the direction expected, but ratings on the anchored scale did not. These results are consistent with our framework and suggest that explicitly anchored paradigms for voice quality evaluation might improve both research and clinical practice.
Purpose Many researchers have studied the acoustics, physiology, and perceptual characteristics of the voice source, but despite significant attention, it remains unclear which aspects of the source should be quantified and how measurements should be made. In this study, the authors examined the relationships among a number of existing measures of the glottal source spectrum, along with the association of these measures to overall spectral shapes and to glottal pulse shapes, to determine which measures of the source best capture information about the shapes of glottal pulses and glottal source spectra. Method Seventy-eight different measures of source spectral shapes were made on the voices of 70 speakers. Principal components analysis was applied to measurement data, and the resulting factors were compared with factors similarly derived from oral speech spectra and glottal pulses. Results Results revealed high levels of duplication and overlap among existing measures of source spectral slope. Further, existing measures were not well aligned with patterns of spectral variability. In particular, existing spectral measures do not appear to model the higher frequency parts of the source spectrum adequately. Conclusion The failure of existing measures to adequately quantify spectral variability may explain why results of studies examining the perceptual importance of spectral slope have not produced consistent results. Because variability in the speech signal is often perceptually salient, these results suggest that most existing measures of source spectral slope are unlikely to be good predictors of voice quality.
To enable differences in modes of glottal vibration to be studied, glottal air volume velocity waveforms can be recovered from speech recordings by inverse filtering. Most previous published work in this area has made use of analog filters. Digital inverse filters offer many advantages, including the ability to change filter settings to match changing vocal tract filter functions. Although the theory and many of the methods necessary for digital inverse filtering have been described in the literature, a straightforward description of the entire process has been lacking. The digital inverse filtering process developed for linguistic research at the UCLA Phonetics Laboratory is described in detail in this paper, with the intention of facilitating such work at other institutions.
Voice quality is an important perceptual cue in many disciplines, but knowledge of its nature is limited by a poor understanding of the relevant psychoacoustics. This article (aimed at researchers studying voice, speech, and vocal behavior) describes the UCLA voice synthesizer, software for voice analysis and synthesis designed to test hypotheses about the relationship between acoustic parameters and voice quality perception. The synthesizer provides experimenters with a useful tool for creating and modeling voice signals. In particular, it offers an integrated approach to voice analysis and synthesis, and allows easy, precise, spectral-domain manipulations of the harmonic voice source. The synthesizer operates in near real-time, using a parsimonious set of acoustic parameters for the voice source and vocal tract that a user can modify to accurately copy the quality of most normal and pathological voices. The software, user’s manual, and audio files may be downloaded from http://mc.psychonomic-journals.org/content/supplemental. Future updates may be downloaded from www.surgery.medsch.ucla.edu/glottalaffairs/.
Crude measures of spectral tilt (F0-H2 difference, and F0-F1 difference) have been demonstrated to be useful for distinguishing phonation types. However with such methods, it is difficult to control for differences due to variations in vowel quality and F0. In order to place such measures on a firmer foundation, the differences in vowel quality can be compensated for by inverse filtering. This technique has been used for analyzing vowels in languages having contrasting phonation types. FM recordings of airflow data in Burmese and Hmong, and ordinary AM audio recordings of !Xóõ and Jalapa Mazatec were analyzed. AM recordings can be used, as phase distortion may be neglected while working in the frequency domain. FFT spectra were made of the inverse-filtered waveforms. We considered several questions such as computational methods for deriving the amplitudes of harmonics, the expected dependency of amplitude on frequency, and the appropriate range of frequencies to examine. The results show that measures of spectral tilt obtained from inverse-filtered data can be used to characterize differences in phonation type. [Work supported by NINCDS.]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.