Research on the acoustic correlates of breathiness has been plagued by a lack of consistent findings across studies and low intra-and inter-rater agreement. Sources of variability can arise from different sources including: differences in stimulus types (recorded or synthesized); differences in speaker groups (for recorded stimuli) or in synthesis parameters (for synthesized stimuli); differences in experimental methodologies (task type, number of repetitions, listener backgrounds and experience). This review discussed these sources of variability, and described solutions that have the potential to address the variability and the inconsistencies often reported in the literature. A critical appraisal of the evidence about the relative importance of various acoustic measures resulted in the identification of measures of periodicity, noise content, and high-to-low frequency energy as the most likely acoustic correlates of breathiness.
Stimuli used in timbre perception studies must be controlled carefully in order to yield meaningful results. During psychoacoustic testing of individual timbre properties, (1) it must be ensured that timbre properties do not co-vary, as timbre properties are often not independent from one another, and (2) the potential influence of loudness, pitch, and perceived duration must be eliminated. A mathematical additive synthesis method is proposed which allows complete control over two spectral parameters, the spectral centroid (corresponding to brightness) and irregularity, and two temporal parameters, log rise-time (LRT) and a parameter characterizing the sustain/decay segment, while controlling for covariation in the spectral centroid and irregularity. Thirteen musical instrument sounds were synthesized. Perceptual data from six listeners indicate that variation in the four timbre properties mainly influences loudness and that perceived duration and pitch are not influenced significantly for the stimuli of longer duration (2 s) used here. Trends across instruments were found to be similar.
Previous research showed that aspiration noise difference limens in moderately breathy /a/ vowels decreased as the spectral slope of the glottal source spectrum became increasingly steep [Kreiman and Gerratt, J. Acoust. Soc. Am. 131(1), 492–500 (2012)]. The current study investigated whether discrimination of aspiration noise levels was affected by differences in spectral shape due to vowel quality (/æ/ and /i/) and speaker identity (three male speakers) when the slope of the glottal source spectrum was fixed. The results showed that discrimination performance was worse overall for /i/ than /æ/, but the result may have resulted from relatively poor performance for the /i/ vowel of one speaker. Acoustic analyses of the stimuli were performed to estimate the association between acoustic properties and the perceptual outcomes. The results showed that both the smoothed cepstral peak prominence and the harmonic energy level between 2 and 5 kHz may account for the observed differences in aspiration noise discrimination among speakers within each vowel, but not for differences between vowel categories. It is possible that the relationship between the aspiration noise discrimination and aforementioned acoustic properties may be modulated by the spectral distribution of energy across frequency.
The presence of noise is a salient cue to the perception of breathiness and aspiration in speech sounds. The detection of noise within harmonic series (maskers) composed of unresolved components was found to depend on the fundamental frequency (fo) and the overall level of the masker [Gockel, Moore, and Patterson (2002). J. Acoust. Soc. Am., 111 (6), 2759–2770]. In the present study, noise detection thresholds were measured as a function of the frequency range, the fo, and the overall level of harmonic maskers. Frequency range was specified in equivalent rectangular bandwidth (ERB) units (3–13, 13–23, 23–33, or 3–33 ERBs). The results were consistent with the idea that listeners rely on spectral cues when maskers comprise only resolved components (3–13 ERBs), and on temporal (dip listening) cues when maskers contain only unresolved components (23–33 ERBs). Noise detection thresholds were generally lower when masker level was high (70 dBA) than when it was low (50 dBA). Masker fo affected thresholds only when listeners relied on spectral cues for noise detection. With the wideband (3–33 ERBs) masker, listeners likely detected noise by focusing on the frequency band (23–33 ERBs) with the most advantageous noise-to-harmonic ratio.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.