The glottal to noise excitation ratio (GNE) is an acoustic measure designed to assess the amount of noise in a pulse train generated by the oscillation of the vocal folds. So far its properties have only been studied for synthesized signals, where it was found to be independent of variations of fundamental frequency (jitter) and amplitude (shimmer). On the other hand, other features designed for the same purpose like NNE (normalized noise energy) or CHNR (cepstrum based harmonics-to-noise ratio) did not show this independence. This advantage of the GNE over NNE and CHNR, as well as its general applicability in voice quality assessment, is now tested for real speech using a large group of pathologic voices (n = 447). A set of four acoustic features is extracted from a total of 22 mostly well-known acoustic voice quality measures by correlation analysis, mutual information analysis, and principal components analysis. Three of these measures are chosen to assess primarily different aspects of signal aperiodicity, while the fourth one indicates the noise content of the signal. All analysis methods lead to the same feature set that consists of a measure of period correlation, jitter, shimmer, and GNE. The two-dimensional projection of this set named "hoarseness diagram" allows a graphical illustration of voice quality that can be easily interpreted.
The hoarseness diagram (Michaelis, Fröhlich, & Strube, 1998a) has been proposed as a new approach to describe different acoustic properties of voices. To test its performance in the analysis of pathologically disturbed and normal voices five requirements are suggested that should be met by any acoustic voice-analysis protocol to be used in voice research and clinical practice. The hoarseness diagram is then tested with regard to these requirements. Individual voices are found to show a satisfactory localization in the diagram. Aspects of stationarity are discussed in the context of four case studies. The different cases illustrate that changes in the acoustic analysis results are observed if the voice-generation conditions change, whereas results are stationary if phonation conditions do not change. Different pathological voice groups defined on grounds of the specific phonation mechanism are found to map to specific regions of the hoarseness diagram, with differences between group locations being significant. All results can be interpreted without exceptions if the two hoarseness diagram coordinates are taken to reflect the vibrational irregularity of the voice-generation mechanisms on the one side and the degree of closure of the vibrating structures on the other side. The hoarseness diagram and its underlying algorithms are thus shown to constitute a useful approach to acoustic voice analysis in research and clinical practice. The tests themselves demonstrate several application possibilities, including the quantitative monitoring of individual voices.
Linear prediction is considered with respect to a nonlinear frequency scale obtained by a first-order all-pass transformation. The predictor can be computed from a frequency-warped autocorrelation function obtained from the power spectrum or by a direct linear transformation of the original acf. Three numerical procedures are compared. Alternatively, the predictor can be determined from a covariance matrix or (adaptively) from continuously formed correlations, suitably defined according to the all-pass transformation. Prediction-error minimization and spectral flattening are no longer equivalent criteria. In the synthesis part of a vocoder or APC system, no inverse transformation is required, since the direct form of the analysis and synthesis filters can be modified so as to immediately realize the warped transfer function. Single-word intelligibility is compared for a predictive vocoder on a "Bark" scale and a linear frequency scale. The Bark scale yields results around 90% even at predictor orders of 5 to 7. More possible applications have been given previously by other authors.
For vowels excited by vigorous glottal vibrations, the instant of glottal closure is tentatively identified with the moment of strongest excitation (at not too low frequencies) and worst linear predictability. Some predictor methods for its determination are reviewed, which do not always yield reliable and unequivocal results. Then Sobakin's method using the determinant of the autocovariance matrix is examined critically and reinterpreted such that the determinant is maximum if the beginning of the interval on which the autocovariance matrix is calculated coincides with the glottal closure. This hypothesis is tested by comparison with the predictor methods and by looking at the inversely filtered waveforms and the formants obtained from predictors determined on a shifted interval. The determinant method seems to be very reliable even for otherwise difficult cases, such as the vowel /u/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.