A three-layered model for expressive speech perception

Huang, Chun-Fang; Akagi, Masato

doi:10.1016/j.specom.2008.05.017

Cited by 40 publications

(29 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The emotional speech data used in this study were selected from the Fujitsu Japanese Emotional Speech Database [16]. This database includes five emotions (neutral, joy, cold anger, sadness, and hot anger) expressed by one professional actress.…”

Section: Speech Datamentioning

confidence: 99%

“…For speaker individuality, spectral envelope and formants of speech have been proved to contribute speaker recognition [11][12][13]. For vocal emotion, previous works focused on the acoustic features conveyed in speech, such as F0, spectral envelope, intensity, and speech rate [14][15][16]. For both speaker individuality and vocal emotion, the timeaveraged acoustic features were investigated.…”

Section: Introductionmentioning

confidence: 99%

“…Experiments of speaker and vocal-emotion recognition were carried out using an analysis/synthesis method of noise-vocoded speech (NVS). The temporal resolution of NVS was controlled by varying the upper limit of modulation frequency (0, 0.5,1,2,4,8,16, 32, and 64 Hz). In addition, the role of temporal cue in the different spectral resolution condition was also investigated by varying the number of channels (4, 8, and 16).…”

mentioning

confidence: 99%

See 2 more Smart Citations

Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech

Zhu

Miyauchi

Araki

et al. 2018

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

This paper investigates the importance of temporal cues in the perception of speaker individuality and vocal emotion. Experiments of speaker and vocal-emotion recognition were carried out using an analysis/synthesis method of noise-vocoded speech (NVS). The temporal resolution of NVS was controlled by varying the upper limit of modulation frequency (0, 0.5,1,2,4,8,16, 32, and 64 Hz). In addition, the role of temporal cue in the different spectral resolution condition was also investigated by varying the number of channels (4, 8, and 16). The results demonstrated that temporal resolution contributes to the recognition of both speaker and vocal emotion. Therefore, temporal cues are found to be important for the perception of not only linguistic information but also speaker individuality and vocal emotion. On the other hand, the performance of speaker recognition was less sensitive to the spectral resolution, at least in the limited set of stimuli in the present study. For vocalemotion recognition, the spectral resolution was shown to be important for recognizing only neutral, joy, and cold anger, but not sadness or hot anger. The important modulation frequency band for the perception of nonlinguistic information was suggested to be higher than that of linguistic information.

show abstract

Section: Speech Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech

Zhu

Miyauchi

Araki

et al. 2018

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

show abstract

“…All emotional speech signals used in this study were selected from the Fujitsu Japanese Emotional Speech Database [9]. This database included five emotions (neutral, joy, cold anger, sadness, and hot anger) spoken by one female speaker.…”

Section: Spectrogrammentioning

confidence: 99%

Feasibility of vocal emotion conversion on modulation spectrogram for simulated cochlear implants

Zhu

Miyauchi

Araki

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-Cochlear implant (CI) listeners were found to have great difficulty with vocal emotion recognition because of the limited spectral cues provided by CI devices. Previous studies have shown that the modulation spectral features of temporal envelopes may be important cues for vocal emotion recognition of noise-vocoded speech (NVS) as simulated CIs. In this paper, the feasibility of vocal emotion conversion on a modulation spectrogram for simulated CIs for correctly recognizing vocal emotion is confirmed. A method based on a linear prediction scheme is proposed to modify the modulation spectrogram and its features of neutral speech to match that of emotional speech. The logic of this approach is that if vocal emotion perception of NVS is based on the modulation spectral features, NVS with similar modulation spectral features of emotional speech will be recognized as the same emotion. As a result, it was found that the modulation spectrogram of neutral speech can be successfully converted to that of emotional speech. The results of the evaluation experiment showed the feasibility of vocal emotion conversion on the modulation spectrogram for simulated CIs. The vocal emotion enhancement on the modulation spectrogram was also further discussed.

show abstract

“…It might be within a certain range due to the uncertainty of cache hit ratio or the workload I/O pattern. One possible approach is applying fuzzy inference (Mamdani & Assilian, 1975) to generate fuzzy rules from the configuration (Huang &Akagi, 2008) and apply them in a rule-based engine (Huang & Katayama, 2005). This is an area for future work.…”

Section: Example Of the Aqr-storage Outputmentioning

confidence: 99%

Intelligent Software-Defined Storage With Deep Traffic Modeling for Cloud Storage Service

Huang¹,

Huang²,

Chen³

2016

STSC

View full text Add to dashboard Cite

The advent of cloud computing, big data, and mobile computing has created a fast-growing demand for storage. Cloud service providers are looking for cost-effective storage solutions as an alternative to traditional, high-cost, embeddedsystems-based storage to meet the needs of newly emerging applications, such as messaging, video streaming, data analytics, etc. In particular, they are facing the challenge of lowering costs while still accommodating multi-workloads on a single instance of storage without compromising workload performance requirements. Software-defined storage (SDS) is a new generation of storage system. Unlike traditional embedded-systems-based storage, the SDS uses a software-stack above commodity hardware to provide more valuable and cost-effective features. To meet the challenges cloud service providers are facing, this paper introduces the architecture of a new SDS platform called Federator. It also argues that the architecture of an SDS platform should have three main characteristics: 1. separation of the control and data pathways, 2. self-configuration of storage resources, and 3. RESTful APIs for new business extension. This paper specifically introduces the storage I/O traffic modeling supported by Federator. With this capability, storage performance metrics are generated by using Long-Short Term Memory (LSTM). This prediction capability is important to a self-configurable SDS to meet performance requirements.

show abstract

A three-layered model for expressive speech perception

Cited by 40 publications

References 31 publications

Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech

Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech

Feasibility of vocal emotion conversion on modulation spectrogram for simulated cochlear implants

Intelligent Software-Defined Storage With Deep Traffic Modeling for Cloud Storage Service

Contact Info

Product

Resources

About