Many works have been done in the methods of improving performance by proposing new speech characteristics and new perception measurements. However, they only focus on one of the two aspects. In this paper, we try to study the relationship between them. That is, we discuss which acoustic features or their combinations are the most consistent with the real perception of Chinese initials. We propose a method that can measure the acoustic distance and keep it monotonically related to the perceptual distance of Chinese initials. We first define the acoustic distance and perceptual distance between different Chinese initials, and single out a proper combination of acoustic features and two compatible distance metrics by conducting clustering analysis on the samples of all types of Chinese initials using MFCC and PLP. Based on the data provided by the General Hospital of the People's Liberation Army, we then calculate the acoustic distance and perceptual distance. Finally, we calculate the Spearman's rho between two types of distance corresponding to the two calculation method. The experiment results show that there is a relatively high strength of monotonic relationship with the selected acoustic features between two types of distance.
Bone-conducted life sounds are useful for monitoring human healthy situation. Although a number of feature extraction methods were proposed for air-conducted speech, they may not meet the requirements of the recognition task for bone-conducted life sounds since there is a large difference between air-conducted speech and bone-conducted life sounds. In order to obtain features that can characterize bone-conducted signals, in this study, we first analyze the property of bone-conducted life sounds itself and compare each kind of life sounds in the frequency region. Then we adopt the methods of F-ratio and improved F-ratio separately to measure the dependences between frequency components and characteristics of life sounds. According to the result of analysis, we design a new adaptive frequency filter to extract the desired discriminative feature. The new feature is combined with the Hidden Markov Model and applied to classify different kinds of bone-conducted life sounds. The experimental results show that the error rate using the proposed feature based on State mean F-ratio is reduced by 7.2% compared with the MFCC feature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.