2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5495030
|View full text |Cite
|
Sign up to set email alerts
|

A new voice source model based on high-speed imaging and its application to voice source estimation

Abstract: There are numerous models of varying complexities which seek to efficiently represent the voice source signal. These models are typically based on data and observations which can come from air-flow masks, electroglottographs, mechanical systems, and the inversefiltering of speech signals. The first part of this study examines observations from the high-speed imaging of the larynx and proposes a new source model, which is shown to provide a better fit for the observed data than existing models. The proposed sou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
23
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 22 publications
(23 citation statements)
references
References 15 publications
0
23
0
Order By: Relevance
“…Studies to date suggest that the relationship between H1-H2 and OQ may be variable, but model fit to empirical pulse shapes is not such that definitive conclusions can be drawn. Because existing empirical data are not sufficient to clarify this situation, this study examined the relationship between H1*-H2* (measured from recorded acoustic signals), OQ, and the asymmetry coefficient (the length of the opening phase relative to the open phase, e.g., Henrich et al, 2001;Shue and Alwan, 2010), 2 measured synchronously from high-speed video images of the vibrating vocal folds. Note that previous work on this topic has used models of the glottal flow, which may differ in pulse skewness from the glottal area functions measured here (e.g., Howe and McGowan, 2007).…”
Section: Introductionmentioning
confidence: 99%
“…Studies to date suggest that the relationship between H1-H2 and OQ may be variable, but model fit to empirical pulse shapes is not such that definitive conclusions can be drawn. Because existing empirical data are not sufficient to clarify this situation, this study examined the relationship between H1*-H2* (measured from recorded acoustic signals), OQ, and the asymmetry coefficient (the length of the opening phase relative to the open phase, e.g., Henrich et al, 2001;Shue and Alwan, 2010), 2 measured synchronously from high-speed video images of the vibrating vocal folds. Note that previous work on this topic has used models of the glottal flow, which may differ in pulse skewness from the glottal area functions measured here (e.g., Howe and McGowan, 2007).…”
Section: Introductionmentioning
confidence: 99%
“…The LF model (Fant et al, 1985) combines an exponentially increasing sinusoidal function and an exponential function with one amplitude parameter (E e ) and three time points; the Fujisaki-Ljungqvist model (Fujisaki and Ljungqvist, 1986) uses polynomials to model the shape and duration of different segments of the flow derivative waveform. Recent studies (Shue and Alwan, 2010;Chen et al, 2012) have proposed models of the glottal area waveform (as derived from high-speed endoscopic recordings of the laryngeal vibrations) rather than the flow pulse or its derivative. With four parameters, the first of these (Shue and Alwan, 2010) uses a combination of sinusoidal and exponential functions similar to the LF model, but with the ability to adjust the slopes of the opening and closing phases separately.…”
Section: A the Source Modelsmentioning
confidence: 99%
“…This paper assesses the relationship between model fit and perceptual accuracy by studying five time-domain source models (three of which are related)-the Rosenberg model (Rosenberg, 1971), the Fujisaki-Ljungqvist model (Fujisaki and Ljungqvist, 1986), the Liljencrants-Fant (LF) model (Fant et al, 1985), and two models proposed by Alwan and colleagues (Shue and Alwan, 2010;Chen et al, 2012)-and one model that describes the voice source in the spectral domain (Table I; Kreiman et al, 2014;see also Cummings and Clements, 1995). The Rosenberg model (Rosenberg, 1971), in contrast to the other models, describes the opening and closing phases of the glottal flow volume velocity with separate trigonometric functions that incorporate two timing parameters and one amplitude parameter.…”
Section: A the Source Modelsmentioning
confidence: 99%
See 2 more Smart Citations