Optimization of MFCC Algorithm for Embedded Voice System

Shi, Tianlong; Zhen, Jiaqi

doi:10.1007/978-981-15-8411-4_88

Cited by 3 publications

(3 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The utilization of an artificial neural network (ANN), a long short-term memory (LSTM), and an XGBoost model is presented, each contributing uniquely to enhancing the accuracy and robustness of the classification framework. Long short-term memory (LSTM): LSTM, a type of recurrent neural network (RNN), excels in sequence-based classification tasks by retaining and utilizing information from past inputs 55 . Its specialized architecture with memory cells allows it to capture intricate dependencies within sequential data, making it a valuable choice for applications such as time series prediction and natural language processing.…”

Section: Methodsmentioning

confidence: 99%

“…Long short-term memory (LSTM): LSTM, a type of recurrent neural network (RNN), excels in sequence-based classification tasks by retaining and utilizing information from past inputs 55 . Its specialized architecture with memory cells allows it to capture intricate dependencies within sequential data, making it a valuable choice for applications such as time series prediction and natural language processing.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

Verma,

Benjwal,

Chhabra

et al. 2023

Sci Rep

View full text Add to dashboard Cite

Voice is an essential component of human communication, serving as a fundamental medium for expressing thoughts, emotions, and ideas. Disruptions in vocal fold vibratory patterns can lead to voice disorders, which can have a profound impact on interpersonal interactions. Early detection of voice disorders is crucial for improving voice health and quality of life. This research proposes a novel methodology called VDDMFS [voice disorder detection using MFCC (Mel-frequency cepstral coefficients), fundamental frequency and spectral centroid] which combines an artificial neural network (ANN) trained on acoustic attributes and a long short-term memory (LSTM) model trained on MFCC attributes. Subsequently, the probabilities generated by both the ANN and LSTM models are stacked and used as input for XGBoost, which detects whether a voice is disordered or not, resulting in more accurate voice disorder detection. This approach achieved promising results, with an accuracy of 95.67%, sensitivity of 95.36%, specificity of 96.49% and f1 score of 96.9%, outperforming existing techniques.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

Verma,

Benjwal,

Chhabra

et al. 2023

Sci Rep

View full text Add to dashboard Cite

show abstract

“…If the voice signal is smoother and more uniform, the parameters for extracting voice features will be better and the quality of voice processing will be better.Pre-emphasis: The first step before processing the voice signal is to pre-emphasize the voice signal. The speech signal is pre-emphasized because it is affected by oral and nose radiation and glottal excitation, and the high-frequency end of the average power spectrum is attenuated by 6 dB above 800 Hz (Shi and Zhen, 2020). The pre-emphasis process generally uses a 6 dB high-frequency boost pre-emphasis digital filter to boost the high-frequency part of the voice signal.…”

Section: Related Workmentioning

confidence: 99%

The impact of student learning aids on deep learning and mobile platform on learning behavior

Fan

Liu

2022

LHT

View full text Add to dashboard Cite

PurposeDeep learning (DL) technology is used to design a voice evaluation system to understand the impact of learning aids on DL and mobile platforms on students’ learning behavior.Design/methodology/approachDL technology is used to design a speech evaluation system.FindingsThe experimental results show that the speech evaluation system designed has a high accuracy rate, the highest agreement rate with manual evaluation of pronunciation is 89.5%, and the correct speech recognition rate is 96.64%. The designed voice evaluation system and the manual voice rating system have a maximum error rate of 2%. The experimental results suggest that it is necessary to further optimize the learning aids for mobile platform. The learning aids of the mobile platform need to be further optimized to promote the improvement of student learning efficiency.Originality/valueThe results show that the speech evaluation system designed has good practical application value, and it provides a certain reference value for the future study of learning tools on DL.

show abstract

Arabic Speech Recognition by Stationary Bionic Wavelet Transform and MFCC Using a Multi-layer Perceptron for Voice Control

Talbi

2022

Signals and Communication Technology

View full text Add to dashboard Cite

Optimization of MFCC Algorithm for Embedded Voice System

Cited by 3 publications

References 3 publications

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

The impact of student learning aids on deep learning and mobile platform on learning behavior

Arabic Speech Recognition by Stationary Bionic Wavelet Transform and MFCC Using a Multi-layer Perceptron for Voice Control

Contact Info

Product

Resources

About