The State of the Art of Feature Extraction Techniques in Speech Recognition

Gupta, Divya; Bansal, Poonam; Choudhary, Kavita

doi:10.1007/978-981-10-6626-9_22

Cited by 42 publications

(23 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the audio recognition studies use MFCC because it has the best performance in extracting the signal. The study in [ 89 ] shows good training and test results in speech recognition using MFCC [ 89 ]. Thus, in our study, we employ MFCC to assist machine learning in extracting the breathing waveform.…”

Section: Proposed Systemmentioning

confidence: 99%

Non-Contact Monitoring and Classification of Breathing Pattern for the Supervision of People Infected by COVID-19

Purnomo

Lin

Adiprabowo

et al. 2021

Sensors

View full text Add to dashboard Cite

During the pandemic of coronavirus disease-2019 (COVID-19), medical practitioners need non-contact devices to reduce the risk of spreading the virus. People with COVID-19 usually experience fever and have difficulty breathing. Unsupervised care to patients with respiratory problems will be the main reason for the rising death rate. Periodic linearly increasing frequency chirp, known as frequency-modulated continuous wave (FMCW), is one of the radar technologies with a low-power operation and high-resolution detection which can detect any tiny movement. In this study, we use FMCW to develop a non-contact medical device that monitors and classifies the breathing pattern in real time. Patients with a breathing disorder have an unusual breathing characteristic that cannot be represented using the breathing rate. Thus, we created an Xtreme Gradient Boosting (XGBoost) classification model and adopted Mel-frequency cepstral coefficient (MFCC) feature extraction to classify the breathing pattern behavior. XGBoost is an ensemble machine-learning technique with a fast execution time and good scalability for predictions. In this study, MFCC feature extraction assists machine learning in extracting the features of the breathing signal. Based on the results, the system obtained an acceptable accuracy. Thus, our proposed system could potentially be used to detect and monitor the presence of respiratory problems in patients with COVID-19, asthma, etc.

show abstract

Section: Proposed Systemmentioning

confidence: 99%

Non-Contact Monitoring and Classification of Breathing Pattern for the Supervision of People Infected by COVID-19

Purnomo

Lin

Adiprabowo

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…After that, the logarithm of respective sub-bands is computed. Lastly, MFCC is determined by applying the inverse Fourier transform [39].…”

Section: ) Mel Frequency Cepstral Coefficientsmentioning

confidence: 99%

A Comprehensive Review of Speech Emotion Recognition Systems

et al. 2021

View full text Add to dashboard Cite

During the last decade, Speech Emotion Recognition (SER) has emerged as an integral component within Human-computer Interaction (HCI) and other high-end speech processing systems. Generally, an SER system targets the speaker's existence of varied emotions by extracting and classifying the prominent features from a preprocessed speech signal. However, the way humans and machines recognize and correlate emotional aspects of speech signals are quite contrasting quantitatively and qualitatively, which present enormous difficulties in blending knowledge from interdisciplinary fields, particularly speech emotion recognition, applied psychology, and human-computer interface. The paper carefully identifies and synthesizes recent relevant literature related to the SER systems' varied design components/methodologies, thereby providing readers with a state-of-the-art understanding of the hot research topic. Furthermore, while scrutinizing the current state of understanding on SER systems, the research gap's prominence has been sketched out for consideration and analysis by other related researchers, institutions, and regulatory bodies.

show abstract

“…• Mel-Frequency Cepstral Coefficients (MFCC): These coefficients represent the short term power spectrum of the speech signal and consist of the most widely used spectral features for emotion recognition [14]. Before calculating the cepstral coefficients, the signal is transformed using a Melfilter bank on a number of sub-band energies [15]. • Linear Prediction Cepstral Coefficients (LPCC):…”

Section: Spectral Featuresmentioning

confidence: 99%

Deep Multimodal Emotion Recognition on Human Speech: A Review

Koromilas

Γιαννακόπουλος

2021

Applied Sciences

View full text Add to dashboard Cite

This work reviews the state of the art in multimodal speech emotion recognition methodologies, focusing on audio, text and visual information. We provide a new, descriptive categorization of methods, based on the way they handle the inter-modality and intra-modality dynamics in the temporal dimension: (i) non-temporal architectures (NTA), which do not significantly model the temporal dimension in both unimodal and multimodal interaction; (ii) pseudo-temporal architectures (PTA), which also assume an oversimplification of the temporal dimension, although in one of the unimodal or multimodal interactions; and (iii) temporal architectures (TA), which try to capture both unimodal and cross-modal temporal dependencies. In addition, we review the basic feature representation methods for each modality, and we present aggregated evaluation results on the reported methodologies. Finally, we conclude this work with an in-depth analysis of the future challenges related to validation procedures, representation learning and method robustness.

show abstract

The State of the Art of Feature Extraction Techniques in Speech Recognition

Cited by 42 publications

References 11 publications

Non-Contact Monitoring and Classification of Breathing Pattern for the Supervision of People Infected by COVID-19

Non-Contact Monitoring and Classification of Breathing Pattern for the Supervision of People Infected by COVID-19

A Comprehensive Review of Speech Emotion Recognition Systems

Deep Multimodal Emotion Recognition on Human Speech: A Review

Contact Info

Product

Resources

About