2022
DOI: 10.3390/app12199979
|View full text |Cite
|
Sign up to set email alerts
|

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Abstract: In one study on vocal emotion recognition using noise-vocoded speech (NVS), the high similarities between modulation spectral features (MSFs) and the results of vocal-emotion-recognition experiments indicated that MSFs contribute to vocal emotion recognition in a clean environment (with no noise and no reverberation). Other studies also clarified that vocal emotion recognition using NVS is not affected by noisy reverberant environments (signal-to-noise ratio is greater than 10 dB and reverberation time is less… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 24 publications
0
0
0
Order By: Relevance
“…Our previous experiments in which we systematically varied the number of channels in the NVS scheme and the upper limitation of the modulation frequency components [11,12] demonstrated that temporal modulation cues play a key role in the recognition of both vocal emotion and speaker individuality, particularly at modulation frequencies lower than 8 Hz. Our research has also demonstrated that these features are robust to noise and reverberation [13,14]. However, we have still not clarified the influence of temporal modulation cues in non-linguistic information such as that used in emergency situations.…”
Section: Introductionmentioning
confidence: 79%
See 2 more Smart Citations
“…Our previous experiments in which we systematically varied the number of channels in the NVS scheme and the upper limitation of the modulation frequency components [11,12] demonstrated that temporal modulation cues play a key role in the recognition of both vocal emotion and speaker individuality, particularly at modulation frequencies lower than 8 Hz. Our research has also demonstrated that these features are robust to noise and reverberation [13,14]. However, we have still not clarified the influence of temporal modulation cues in non-linguistic information such as that used in emergency situations.…”
Section: Introductionmentioning
confidence: 79%
“…First, we reduced the effect of the average intensity by normalizing the active sp levels of all speech signals to −26 dBov with a P.56 speech voltmeter [14]. We then plemented band-pass filters (BPF) (essentially functioning as a band-pass filterbank divide the signals into several frequency bands.…”
Section: Noise-vocoded Speechmentioning
confidence: 99%
See 1 more Smart Citation