2015
DOI: 10.1121/1.4922174
|View full text |Cite
|
Sign up to set email alerts
|

Perceptual evaluation of voice source models

Abstract: Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…Direct comparison of VF kinematics obtained with laryngeal high-speed videoendoscopy and radiated pressure has shown that current models of speech production are not capable of fully capturing the complexity of the phenomena, especially under asymmetric VF conditions [3, 8]. Similar observations have been made when assessing the perceptual relevance of the model output for both normal [9] and asymmetric VF vibration [10]. Thus, outstanding issues exist for advancing physics-based descriptions of airflow, sound and tissue interactions in symmetric and asymmetric VF vibration.…”
Section: Introductionmentioning
confidence: 75%
“…Direct comparison of VF kinematics obtained with laryngeal high-speed videoendoscopy and radiated pressure has shown that current models of speech production are not capable of fully capturing the complexity of the phenomena, especially under asymmetric VF conditions [3, 8]. Similar observations have been made when assessing the perceptual relevance of the model output for both normal [9] and asymmetric VF vibration [10]. Thus, outstanding issues exist for advancing physics-based descriptions of airflow, sound and tissue interactions in symmetric and asymmetric VF vibration.…”
Section: Introductionmentioning
confidence: 75%
“…In addition, many control parameters resist simple adjustments. For example, the parameters that control voice quality do so by changing the shape of glottal pulses, and there is no simple way to predict which changes in control parameters will achieve the desired change in the output spectrum (Kreiman, Garellek, Chen, Alwan, & Gerratt, 2015).…”
Section: How Does Soundgen Compare To the Alternatives?mentioning
confidence: 99%
“…In fact, even if a vocoder closely matches the shape of glottal pulses, it does not guarantee that the perceptually relevant spectral characteristics will be captured successfully. This is why it has been suggested that auditory perception in humans is better modeled in the frequency than time domain (Doval & d’Alessandro, 1997; Kreiman et al, 2015). Furthermore, despite all the diversity of sound sources in the animal world, the source-filter model (Fant, 1960) still holds across species (Goller, 2016; Taylor & Reby, 2010).…”
Section: How Does Soundgen Compare To the Alternatives?mentioning
confidence: 99%
“…1 The source spectrum was then smoothed by fitting it with a four-piece model whose segments ranged from the first to the second harmonic (H1-H2), from H2 to the harmonic nearest 2 kHz (H2-2 kHz), from the harmonic nearest 2 kHz to that nearest 5 kHz (2-5 kHz), and from H2 to the harmonic nearest 5 kHz (H2-5 kHz). These segments were chosen because they capture most of the variability in source spectral shapes (Kreiman, Gerratt, & Antoñanzas-Barroso, 2007a), their individual perceptual importance has been established (Garellek, Keating, Esposito, & Kreiman, 2013;Kreiman & Garellek, 2011), and in combination, they appear to form an adequate psychoacoustic model of source contributions to voice quality (Garellek, Samlan, Gerratt, & Kreiman, 2016;Kreiman, Garellek, Chen, Alwan, & Gerratt, 2015;Kreiman, Gerratt, Garellek, Samlan, & Zhang, 2014). These measures were thus preferred to others found in the literature (jitter, shimmer, etc.)…”
Section: Acoustic Evaluationmentioning
confidence: 99%