Inference-aware convolutional neural network pruning

Choudhary, Tejalal; Mishra, Vipul Kumar; Goswami, Anurag; Jagannathan, S.

doi:10.1016/j.future.2022.04.031

Cited by 18 publications

(30 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nonetheless, the proposed model should function effortlessly on devices with limited hardware resources and on applications with strict latency requirements without any deterioration in recognition accuracy. To this end, model compression technology is essential [ 51 , 52 ].…”

Section: Discussionmentioning

confidence: 99%

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

Toyoshima

Okada

Ishimaru

et al. 2023

Sensors

View full text Add to dashboard Cite

The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning MelSpec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion “happiness”, which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development.

show abstract

Section: Discussionmentioning

confidence: 99%

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

Toyoshima

Okada

Ishimaru

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…To realize widespread use in medical practice and personal use, simpler models that can run even on small-scale electronic devices such as smartphones and wearable devices must be developed. For this problem, model compression techniques [ 31 , 32 ] will be an effective approach.…”

Section: Discussionmentioning

confidence: 99%

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

Ishimaru

Okada

Uchiyama

et al. 2023

IJERPH

View full text Add to dashboard Cite

Audio features are physical features that reflect single or complex coordinated movements in the vocal organs. Hence, in speech-based automatic depression classification, it is critical to consider the relationship among audio features. Here, we propose a deep learning-based classification model for discriminating depression and its severity using correlation among audio features. This model represents the correlation between audio features as graph structures and learns speech characteristics using a graph convolutional neural network. We conducted classification experiments in which the same subjects were allowed to be included in both the training and test data (Setting 1) and the subjects in the training and test data were completely separated (Setting 2). The results showed that the classification accuracy in Setting 1 significantly outperformed existing state-of-the-art methods, whereas that in Setting 2, which has not been presented in existing studies, was much lower than in Setting 1. We conclude that the proposed model is an effective tool for discriminating recurring patients and their severities, but it is difficult to detect new depressed patients. For practical application of the model, depression-specific speech regions appearing locally rather than the entire speech of depressed patients should be detected and assigned the appropriate class labels.

show abstract

“…To realize a wide range of uses in medical practice, it is necessary to construct a simpler model that can even work on small-scale electronic devices such as smartphones and wearable terminals. To address this issue, model-compression techniques [ 34 , 35 ] would be an effective approach.…”

Section: Discussionmentioning

confidence: 99%

End-to-End Convolutional Neural Network Model to Detect and Localize Myocardial Infarction Using 12-Lead ECG Images without Preprocessing

et al. 2022

View full text Add to dashboard Cite

In recent years, many studies have proposed automatic detection and localization techniques for myocardial infarction (MI) using the 12-lead electrocardiogram (ECG). Most of them applied preprocessing to the ECG signals, e.g., noise removal, trend removal, beat segmentation, and feature selection, followed by model construction and classification based on machine-learning algorithms. The selection and implementation of preprocessing methods require specialized knowledge and experience to handle ECG data. In this paper, we propose an end-to-end convolutional neural network model that detects and localizes MI without such complicated multistep preprocessing. The proposed model executes comprehensive learning for the waveform features of unpreprocessed raw ECG images captured from 12-lead ECG signals. We evaluated the classification performance of the proposed model in two experimental settings: ten-fold cross-validation where ECG images were split randomly, and two-fold cross-validation where ECG images were split into one patient and the other patients. The experimental results demonstrate that the proposed model obtained MI detection accuracies of 99.82% and 93.93% and MI localization accuracies of 99.28% and 69.27% in the first and second settings, respectively. The performance of the proposed method is higher than or comparable to that of existing state-of-the-art methods. Thus, the proposed model is expected to be an effective MI diagnosis tool that can be used in intensive care units and as wearable technology.

show abstract

Inference-aware convolutional neural network pruning

Cited by 18 publications

References 25 publications

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

End-to-End Convolutional Neural Network Model to Detect and Localize Myocardial Infarction Using 12-Lead ECG Images without Preprocessing

Contact Info

Product

Resources

About