2020
DOI: 10.1109/access.2020.3031763
|View full text |Cite
|
Sign up to set email alerts
|

A Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patterns

Abstract: Interpreting a speech signal is quite challenging because it consists of different frequencies and features that vary according to emotions. Although different algorithms are being developed in the speech emotion recognition (SER) domain, the success rates vary according to the spoken languages, emotions, and databases. In this study, a new lightweight effective SER method has been developed that has low computational complexity. This method, called 1BTPDN, is applied on RAVDESS, EMO-DB, SAVEE, and EMOVO datab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(6 citation statements)
references
References 55 publications
0
6
0
Order By: Relevance
“…Several studies have applied signal processing using discrete wavelet transform (DWT). One study applied DWT for signal noise reduction on a FPGA-based chip design platform and integrated audio encoder on a FPGA board [1,2]. One study developed a new algorithm for DAW based on DWT yielding better SNR and BER rates compared with other approaches [3].…”
Section: Methodsmentioning
confidence: 99%
“…Several studies have applied signal processing using discrete wavelet transform (DWT). One study applied DWT for signal noise reduction on a FPGA-based chip design platform and integrated audio encoder on a FPGA board [1,2]. One study developed a new algorithm for DAW based on DWT yielding better SNR and BER rates compared with other approaches [3].…”
Section: Methodsmentioning
confidence: 99%
“…To accomplish this, an effective feature descriptor is needed that is able to better capture the properties of vocal tract variations of different speakers with or without face mask. The effectiveness of local acoustic patterns in audio spoofing countermeasures for better capturing the traits of bonafide and spoof samples in [ 26 ] motivated us to develop the local Ternary Deviated overlapping patterns (TDoP) descriptor for ASV systems. For this purpose, we propose a 22D novel TDoP feature descriptor to better capture the vocal dynamics of different speakers in the time domain and fuse them with 14D GTCC and 14D MFCC features to extract the relevant information in frequency domain.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Even when different algorithms for SER are created, success rates vary according to the language of the speech, emotions, and datasets. A unique text-independent and speaker-independent SER method called 1BTPDN was used to build a lightweight strategy for addressing a nonpolynomial issue by scraping handmade features 2020b). The 1BTPDN approach assists in minimizing important data loss while also increasing accuracy.…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%