The CAPE-V form and instructions, included as appendices to this article, enable clinicians to document perceived voice quality deviations following a standard (i.e., consistent and specified) protocol.
A new descriptive framework for voice quality perception (Kreiman, Gerratt, Kempster, Erman, & Berke, 1993) states that when listeners rate a voice on some quality dimension (e.g., roughness), they compare the stimulus presented to an internal standard or scale. Hypothetically, substituting explicit, external standards for these unstable internal standards should improve listener reliability. Further, the framework suggests that internal standards for vocal qualities are inherently unstable, and may be influenced by factors other than the physical signal being judged. Among these factors, context effects may cause drift in listeners’ voice ratings by influencing the internal standard against which judgments are made. To test these hypotheses, we asked 12 clinicians to judge the roughness of 22 synthetic stimuli using two scales: a traditional 5-point equal-appearing interval (EAI) scale and a scale with explicit anchor stimuli for each scale point. The stimulus set included a relatively large number of normal and mildly rough voices. We predicted that this would produce an increase in the perceived roughness of moderately rough stimuli over time for the EAI ratings, but not for the explicitly anchored ratings. Ratings made using the anchored scale were significantly more reliable than those gathered using the unanchored paradigm. Further, as predicted, ratings on the unanchored EAI scale drifted significantly within a listening session in the direction expected, but ratings on the anchored scale did not. These results are consistent with our framework and suggest that explicitly anchored paradigms for voice quality evaluation might improve both research and clinical practice.
Modeling sources of listener variability in voice quality assessment is the first step in developing reliable, valid protocols for measuring quality, and provides insight into the reasons that listeners disagree in their quality assessments. This study examined the adequacy of one such model by quantifying the contributions of four factors to interrater variability: instability of listeners' internal standards for different qualities, difficulties isolating individual attributes in voice patterns, scale resolution, and the magnitude of the attribute being measured. One hundred twenty listeners in six experiments assessed vocal quality in tasks that differed in scale resolution, in the presence/absence of comparison stimuli, and in the extent to which the comparison stimuli (if present) matched the target voices. These factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their assessments. Providing listeners with comparison stimuli that matched the target voices doubled the likelihood that they would agree exactly. Listeners also agreed significantly better when assessing quality on continuous versus six-point scales. These results indicate that interrater variability is an issue of task design, not of listener unreliability.
Five speech-language clinicians and 5 naive listeners rated the similarity of pairs of normal and dysphonic voices. Multidimensional scaling was used to determine the voice characteristics that were perceptually important for each voice set and listener group. Solution spaces were compared to determine if clinical experience affects perceptual strategies. Naive and expert listeners attended to different aspects of voice quality when judging the similarity of voices, for both normal and pathological voices. All naive listeners used similar perceptual strategies; however, individual clinicians differed substantially in the parameters they considered important when judging similarity. These differences were large enough to suggest that care must be taken when using data averaged across clinicians, because averaging obscures important aspects of an individual’s perceptual behavior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.