SUMMARY Combination of mutually complementary features is necessary to cope with various changes in pattern classification between normal and pathological voices. This paper proposes a method to improve pathological/normal voice classification performance by combining heterogeneous features. Different combinations of auditory-based and higherorder features are investigated. Their performances are measured by Gaussian mixture models (GMMs), linear discriminant analysis (LDA), and a classification and regression tree (CART) method. The proposed classification method by using the CART analysis is shown to be an effective method for pathological voice detection, with a 92.7% classification performance rate. This is a noticeable improvement of 54.32% compared to the MFCC-based GMM algorithm in terms of error reduction. key words: pathological voice detection, heterogeneous feature combination, mel frequency filter bank energies, higher-order statistics, pattern classification algorithm
Objective/Hypothesis-Acoustic analysis is a commonly used method for quantitatively measuring vocal fold function. Voice signals are analyzed by selecting a waveform segment and using various algorithms to arrive at parameters such as jitter, shimmer, and signal-to-noise ratio (SNR). Accurate and reliable methods for selecting a representative vowel segment have not been established.
Study Design-Prospective repeated measures experimentMethods-We applied a moving window method by isolating consecutive, overlapping segments of the raw voice signal from onset through offset. Ten normal voice signals were analyzed using acoustic measures calculated from the moving window. The location and value of minimum perturbation/maximum SNR was compared across individuals. The moving window method was compared with data from the whole vowel excluding onset and offset, the mid-vowel and the visually selected steadiest portion of the voice signal.
Results-Resultsshowed that the steadiest portion of the waveforms, as defined by minimum perturbation and maximum SNR values, was not consistent across individuals. Perturbation and nonlinear dynamic values differed significantly based on what segment of the waveform was used. Other commonly used segments selection methods resulted in significantly higher perturbation values and significantly lower SNR values than those determined by the moving window method (p<0.001).Conclusions-The selection of a sample for acoustic analysis can introduce significant inconsistencies into the analysis procedure. The moving window technique may provide more accurate and reliable acoustic measures by objectively identifying the steadiest segment of the voice sample.
SUMMARY This work proposes new features to improve the pathological voice quality classification performance. They are the means, the variances, and the perturbations of the higher-order statistics (HOS) such as the skewness and the kurtosis. The HOS-based features show meaningful differences among normal, grade 1, grade 2, and grade 3 voices classified in the GRBAS scale. The jitter, the shimmer, the harmonic-to-noise ratio (HNR), and the variance of the short-time energy are utilized as the conventional features. The performances are measured by the classification and regression tree (CART) method. Specifically, the CART-based method by utilizing both the conventional features and the HOS-based ones shows its effectiveness in the pathological voice quality measurement, with the classification accuracy of 87.8%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.