The detection of different levels of physical load from speech has many applications: Besides telemedicine, non-contact detection of certain heart rate ranges can be useful for sports and other leisure time devices. Available approaches mainly use a high number of spectral and prosodic features. In this setting of typically small data sets, such as the Talk & Run data set and the Munich Biovoice Corpus, the high-dimensional feature spaces are only sparsely populated. Therefore, we aim at a reduction of the feature number using modern neural net inspired features: Bottleneck layer features, obtained from standard low-level descriptors via a feed-forward neural network, and activation map features, obtained from spectrograms via a convolutional neural network. We use these features for an SVM classification of high and low physical load and compare their performance. We also discuss the possibility of hyperparameter transfer of the extracting networks between different data sets. We show that even for limited amounts of data, deep learning based methods can bring a substantial improvement over "conventional" features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.