2018
DOI: 10.1250/ast.39.379
|View full text |Cite
|
Sign up to set email alerts
|

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Abstract: Previous studies on noise-vocoded speech showed that the temporal modulation cues provided by the temporal envelope play an important role in the perception of vocal emotion. However, the exact role that the temporal envelope and its modulation components play in the perceptual processing of vocal emotion is still unknown. To clarify the exact features that the temporal envelope contributes to the perception of vocal emotion, a method based on the mechanism of modulation frequency analysis in the auditory syst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…In auditory front-ends, temporal modulation cues are obtained using auditory filtering of speech signal and modulation filtering of temporal amplitude envelope. These cues contain rich spectral-temporal information to perceive the variations of intensity, duration and pitch of speech [2] and have been widely used in sound-texture perception [3], speaker-individuality perception [4], speech recognition [5], and emotion recognition [6], [7]. Most studies extracted the modulation spectral features (MSFs) from temporal modulation cues by calculating the spectral centroid, flatness, skewness, kurtosis, and other statistical features.…”
Section: Introductionmentioning
confidence: 99%
“…In auditory front-ends, temporal modulation cues are obtained using auditory filtering of speech signal and modulation filtering of temporal amplitude envelope. These cues contain rich spectral-temporal information to perceive the variations of intensity, duration and pitch of speech [2] and have been widely used in sound-texture perception [3], speaker-individuality perception [4], speech recognition [5], and emotion recognition [6], [7]. Most studies extracted the modulation spectral features (MSFs) from temporal modulation cues by calculating the spectral centroid, flatness, skewness, kurtosis, and other statistical features.…”
Section: Introductionmentioning
confidence: 99%
“…We previously reported that temporal modulationspectral features derived from the temporal-modulation spectrogram (time, acoustical frequency, and modulation frequency domains), as shown in Figs. 6 and 7, can be clarified to categorize emotion types according to the discriminability index (d 0 ) in vocal-emotion recognition [42]. In particular, higher-order moments, such as modulation spectral centroid (MSCR), modulation spectral spread (MSSP), modulation spectral skewness (MSSK), modulation spectral kurtosis (MSKT), and modulation spectral tilt (MSTL), are useful modulation-spectral features.…”
Section: Temporal Modulation-spectral Featuresmentioning
confidence: 99%
“…Estimated probability-distribution function of MSCR (MSCR m , m ¼ 4) under clean conditions[42]. Similarity of each modulation-spectral feature (taken across all acoustic or modulation frequency channels)[42].…”
mentioning
confidence: 99%
“…Studies revealed that TAEs of speech and their modulation spectral components contribute to the recognition of linguistic information as well as perception of non-linguistic information. We previously investigated the effect of controlling the upper limit of the modulation frequency of a TAE on the perception of non-linguistic information, i.e., speaker individuality and vocalemotion, using NVS [7][8][9]. It was found that the speakerdistinction and vocal-emotion-recognition rates decrease as the upper limit of the modulation frequency becomes lower [7][8][9].…”
Section: Introductionmentioning
confidence: 99%
“…We previously investigated the effect of controlling the upper limit of the modulation frequency of a TAE on the perception of non-linguistic information, i.e., speaker individuality and vocalemotion, using NVS [7][8][9]. It was found that the speakerdistinction and vocal-emotion-recognition rates decrease as the upper limit of the modulation frequency becomes lower [7][8][9].…”
Section: Introductionmentioning
confidence: 99%