2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
DOI: 10.1109/icassp.2000.861811
|View full text |Cite
|
Sign up to set email alerts
|

A novel approach to the fully automatic extraction of Fujisaki model parameters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
98
0
2

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 132 publications
(100 citation statements)
references
References 5 publications
0
98
0
2
Order By: Relevance
“…Functional combinations that occur more than once are combined, so that there is no redundancy of functional representation. Note that such functional combination and boundary projection is an alternative to the superposition approach which requires F 0 -contour decomposition before parameter extraction, extracting two separate sets of parameters during trainings, and algorithmic summation during synthesis (Bailly and Holm, 2005;Fujisaki et al, 2005;Mixdorff et al, 2003). Here for each functional combination at the smallest temporal unit, only a single set of parameters need to be learned directly from the original (i.e., nondecomposed) F 0 contours during training and used during synthesis.…”
Section: Functional Annotationmentioning
confidence: 99%
See 1 more Smart Citation
“…Functional combinations that occur more than once are combined, so that there is no redundancy of functional representation. Note that such functional combination and boundary projection is an alternative to the superposition approach which requires F 0 -contour decomposition before parameter extraction, extracting two separate sets of parameters during trainings, and algorithmic summation during synthesis (Bailly and Holm, 2005;Fujisaki et al, 2005;Mixdorff et al, 2003). Here for each functional combination at the smallest temporal unit, only a single set of parameters need to be learned directly from the original (i.e., nondecomposed) F 0 contours during training and used during synthesis.…”
Section: Functional Annotationmentioning
confidence: 99%
“…Among the various aspects of prosody, fundamental frequency (F 0 ) is by far the most challenging, and has attracted most of the research effort. Many theories and computational models of F 0 patterns have been proposed over the years (Anderson et al, 1984;Bailly and Holm, 2005;Black and Hunt, 1996;Fujisaki et al, 2005;Grabe et al, 2007;Hirst, 2005Hirst, , 2011Jilka et al, 1999;Kochanski and Shih, 2003;Mixdorff et al, 2003;Pierrehumbert, 1980Pierrehumbert, , 1981Prom-on et al, 2009;Taylor, 2000;van Santen and Möbius, 2000;Xu and Wang, 2001;Xu, 2005), and a large number of empirical studies have been conducted (as reviewed by Wagner and Watson, 2010;ShattuckHufnagel and Turk, 1996;Xu, 2011). Despite the extensive effort, however, most of the critical issues still remain unresolved and some are still under heated debate (Arvaniti and Ladd, 2009;Ladd, 2008;Wagner and Watson, 2010;Wightman, 2002;Xu, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…Such features include spectral, temporal, perceptual, short-time energy, and MPEG-7 standard descriptors [5]. Moreover, motivated by the high speech emotion recognition accuracy reported in [67], when Fujisaki's model parameters [38] are considered, these features are also tested here, as can be seen in Table 7. In addition, jitter and shimmer, are listed in Table 8.…”
Section: -161mentioning
confidence: 99%
“…We will assess the comparative performance of our algorithm with the results obtained with the CR parameter extraction tool of Mixdorff (2000). We will calculate the WCORR norm obtained with the CR model, and use it to assess the perceptual quality of the modelled contour, comparing it with our WCAD results.…”
Section: Experiments Designmentioning
confidence: 99%
“…Most intonation models use discontinuous pitch trackers and then interpolate unvoiced regions using, for instance, spline interpolation (Yu and Young, 2011). Then they concentrate the effort of modelling on the voiced parts (e.g., Mixdorff, 2000;Narusawa et al, 2002) or simply model everything equally (e.g., Hirst et al, 2000;Taylor, 2000). To avoid the latter, one needs a way of assessing which parts of the F 0 contour are perceptually relevant.…”
Section: Introductionmentioning
confidence: 99%