A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag

Hirose, K.; Fujisaki, Hiroya; Seto, Shigenobu

doi:10.1109/icassp.1992.225950

Cited by 15 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The F 0 contours were extracted by the modified autocorrelation analysis of the LPC residual [Hirose et al, 1992]. Syllable boundaries and rhyme boundaries were marked manually by visual inspection of the waveform and the spectrogram.…”

Section: Speech Datamentioning

confidence: 99%

Analysis of Tones in Cantonese Speech Based on the Command-Response Model

Gu¹,

Hirose²,

Fujisaki³

2007

Phonetica

View full text Add to dashboard Cite

As one of the major Chinese dialects, Cantonese has a tone system consisting of nine lexical tones and three additional changed tones, which is considerably more complex than that of Mandarin. The most important acoustic feature characterizing these tones is the contour of the voice fundamental frequency (the F₀ contour). In this article we present an approach to modeling F₀ contours of Cantonese utterances, based on an extension of the command-response model. Analysis-bysynthesis of F₀ contours of the utterances with a fixed carrier frame, in which a target syllable with each tone type is embedded, shows that each tone type can be represented by a specific pattern (polarity, timing, and amplitude) of tone commands. These patterns are found to be essentially maintained in F₀ contours of the utterances with unconstrained text. With the definition of these tone command patterns, the command-response model not only provides a novel phonological description of tones, but also gives high accuracy of approximations to F₀ contours of Cantonese utterances and allows one to analyze various tonal phenomena in quantitative terms. Quantitative distinctions between various tones are then revealed by statistical analysis of the timing and amplitude of tone commands. Especially, systematic alignment in timing is found between the onsets/offsets of tone commands and the rhyme of a syllable, and hence a set of constraints can be introduced, which together with those on tone command amplitudes and phrase command parameters, is then applied for generating F₀ contours of Cantonese utterances. The validity of the approach is verified by perceptual evaluation of the synthetic speech stimuli with model-generated F₀ contours, both on the intelligibility of tones and on the naturalness of prosody.

show abstract

Section: Speech Datamentioning

confidence: 99%

Analysis of Tones in Cantonese Speech Based on the Command-Response Model

Gu¹,

Hirose²,

Fujisaki³

2007

Phonetica

View full text Add to dashboard Cite

show abstract

“…The relationship between frame lengths and the f0 of a speaker is complicated due to the inherent variation of frequency profiles from one speaker to the next (Hirose, Fujisaki, & Seto, 1992), but managing speaker specific analysis settings individuality requires extensive expertise and time and is impractical for large volumes of data. If the pitch floor is set it too low, very fast f0 changes will be missed, and if it is set too high, low f0 values will be neglected.…”

Section: Discussionmentioning

confidence: 99%

Standardization of pitch-range settings in voice acoustic analysis

Vogel

Maruff

Snyder

et al. 2009

Behavior Research Methods

View full text Add to dashboard Cite

Voice acoustic analysis is typically a labor-intensive, time-consuming process that requires the application of idiosyncratic parameters tailored to individual aspects of the speech signal. Such processes limit the efficiency and utility of voice analysis in clinical practice as well as in applied research and development. In the present study, we analyzed 1,120 voice files, using standard techniques (case-by-case hand analysis), taking roughly 10 work weeks of personnel time to complete. The results were compared with the analytic output of several automated analysis scripts that made use of preset pitch-range parameters. After pitch windows were selected to appropriately account for sex differences, the automated analysis scripts reduced processing time of the 1,120 speech samples to less than 2.5 h and produced results comparable to those obtained with hand analysis. However, caution should be exercised when applying the suggested preset values to pathological voice populations.

show abstract

“…B: Reading of a chapter of a book by another male speaker (consisting of 85 sentences that are longer on the average than those of Speech Material A) recorded from a radio program "From My Bookshelf" by the Japan Broadcasting Corporation (NHK). These speech signals were digitized at 10 kHz with 16-bit precision, and the fundamental frequency was extracted by a modified autocorrelation analysis of the LPC residual signal [8]. the Japanese utterance: "Ikutsukano otodake sokokara shakuyooshite, raibuno fun'ikio sokonawazuni henshuusuru kotoga dekiru."…”

Section: Speech Materialsmentioning

confidence: 99%

Evaluation of an improved method for automatic extraction of model parameters from fundamental frequency contours of speech

Narusawa,

Minematsu,

Hirose

et al. 2004

Speech Prosody 2004

View full text Add to dashboard Cite

The authors have already presented a method for automatic extraction of accent and phrase commands of a model from a given F0 contour of speech. This paper describes improvements introduced to cope with difficulties encountered by the previous method, especially in connection with the extraction of accent commands, and reports the results of experiments conducted for the evaluation of the current method using two sets of speech materials differing in sentence length and syntactic complexity. It is shown that the method works quite well for the majority of utterances tested. Analysis of performance in terms of misses and false insertions of commands indicates that the performance is slightly better for shorter utterances, and that most of the errors are related to commands of smaller magnitude/amplitude, suggesting that their effects on the perception of naturalness of prosody are of minor importance.

show abstract

A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag

Cited by 15 publications

References 6 publications

Analysis of Tones in Cantonese Speech Based on the Command-Response Model

Analysis of Tones in Cantonese Speech Based on the Command-Response Model

Standardization of pitch-range settings in voice acoustic analysis

Evaluation of an improved method for automatic extraction of model parameters from fundamental frequency contours of speech

Contact Info

Product

Resources

About