Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech

Jackson, Philip J. B.; Shadle, Christine H.

doi:10.1109/89.952489

Cited by 69 publications

(38 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Harmonics-to-noise ratio (HNR): A spectral measure of harmonics-to-noise ratio was performed using a periodic/noise decomposition method that employs a comb filter to extract the harmonic component of a signal [17][18][19]. This "pitch-scaled harmonic filter" approach uses an analysis window duration equal to an integer number of local periods (four in the current work) and relies on the property that harmonics of the fundamental frequency exist at specific frequency bins of the short-time discrete Fourier transform (DFT).…”

Section: Voice Source Propertiesmentioning

confidence: 99%

Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing

Williamson

Quatieri

Helfer

et al. 2014

Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

160

111

View full text Add to dashboard Cite

1In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production and facial expression. These changes are typically associated with psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination and timing across multiple motor-based properties. Changes in motor outputs can be inferred from vocal acoustics and facial movements as individuals speak. We derive novel multi-scale correlation structure and timing feature sets from audio-based vocal features and videobased facial action units from recordings provided by the 4th International Audio/Video Emotion Challenge (AVEC). The feature sets enable detection of changes in coordination, movement, and timing of vocal and facial gestures that are potentially symptomatic of depression. Combining complementary features in Gaussian mixture model and extreme learning machine classifiers, our multivariate regression scheme predicts Beck depression inventory ratings on the AVEC test set with a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for continued study into detection of neurological disorders based on altered coordination and timing across audio and video modalities.

show abstract

Section: Voice Source Propertiesmentioning

confidence: 99%

Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing

Williamson

Quatieri

Helfer

et al. 2014

Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

160

111

View full text Add to dashboard Cite

show abstract

“…It discriminates words in tonal languages, allows expressing emotions, discriminates questions from statements, and allows emphasizing parts of an utterance. Furthermore, pitch tracking is the basis for the separation of harmonic speech from other speech components and background noise [1].…”

Section: Introductionmentioning

confidence: 99%

Pitch Estimation using Models of Voiced Speech on Three Levels

Joho

Bennewitz

Behnke

2007

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07

View full text Add to dashboard Cite

We present an algorithm for estimating the fundamental frequency in speech signals. Our approach incorporates models of voiced speech on three levels. First, we estimate the pitch for each time frame based on its harmonic structure using non-negative matrix factorization. The second level utilizes temporal pitch continuity to extract partial pitch contours. Thirdly, we incorporate statistics of the succession of voiced segments to aggregate partial contours to the final contour of an utterance. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.

show abstract

“…If the signal is judged unvoiced, then indirect measures such as zero crossing rate and the ratio of low-to high-frequency energy are used to determine if the signal contains noise. In [3], estimates of simultaneous voiced and turbulence-noise components in the speech signal are obtained, but the performance of the system relies on accurate estimates of the pitch period. However, pitch estimation is a difficult task that is prone to errors (pitch doubling and pitch halving).…”

Section: Introductionmentioning

confidence: 99%

A measure of aperiodicity and periodicity in speech

Deshmukh

Wilson

2003

2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)

View full text Add to dashboard Cite

In this paper, we discuss a direct measure for aperiodic energy and periodic energy in speech signals. Most measures for aperiodicity have been indirect, such as zero crossing rate, highfrequency energy and the ratio of high-frequency energy to lowfrequency energy. Such indirect measurements will usually fail in situations where there is both strong periodic and aperiodic energy in the speech signal, as in the case of some voiced fricatives or when there is a need to distinguish between high frequency periodic versus high frequency aperiodic energy. We propose an AMDF based temporal method to estimate directly the amount of periodic and aperiodic energy in the speech signal. The algorithm also gives an estimate of the pitch period in periodic regions.

show abstract

Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech

Cited by 69 publications

References 48 publications

Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing

Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing

Pitch Estimation using Models of Voiced Speech on Three Levels

A measure of aperiodicity and periodicity in speech

Contact Info

Product

Resources

About