2003
DOI: 10.1016/s0167-6393(02)00083-3
|View full text |Cite
|
Sign up to set email alerts
|

Emotions, speech and the ASR framework

Abstract: Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of particular importance in this communication process. This paper discusses the possibilities to recognize emotion from the speech signal, primarily from the viewpoint of automatic speech recognition (ASR). The general focus is on the extraction of acoustic features from … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0
1

Year Published

2007
2007
2018
2018

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 83 publications
(31 citation statements)
references
References 44 publications
0
30
0
1
Order By: Relevance
“…The reason why is probably that elaborated tools such as HTK have been previously developed for similar tasks such as speech and speaker recognition. For acted emotions, there are numerous references [215,188,249]; for non-acted emotions, fewer are known [105,225,231,193]: The performance of static modelling through functionals is often reported as superior [188,225,193] as emotion is apparently better modelled on a time-scale above frame-level; note that a combination of static features such as minimum, maximum, onset, offset, duration, regression, etc. implicitly shape contour dynamics as well.…”
Section: Classificationmentioning
confidence: 99%
“…The reason why is probably that elaborated tools such as HTK have been previously developed for similar tasks such as speech and speaker recognition. For acted emotions, there are numerous references [215,188,249]; for non-acted emotions, fewer are known [105,225,231,193]: The performance of static modelling through functionals is often reported as superior [188,225,193] as emotion is apparently better modelled on a time-scale above frame-level; note that a combination of static features such as minimum, maximum, onset, offset, duration, regression, etc. implicitly shape contour dynamics as well.…”
Section: Classificationmentioning
confidence: 99%
“…In fact, existing automatic speech recognition systems can benefit from the extra information that emotion recognition can provide (Ten Bosch, 2003;Dusan and Rabiner, 2005). It would be useful to produce speech transcripts that not only contain the words said by different speakers, but also the speaker's state or emotion under which the words were said.…”
Section: Introductionmentioning
confidence: 99%
“…Komad ongi eesti keeles sellised kirjavahemärgid, mille kohal jäetakse paus tihtipeale tegemata. Ka mõned varasemad uurimused kinnitavad sama (vt Tamuri 2007, Pajupuu, Kerge 2006 Uurimismaterjalis oli emotsiooniti komasid lausetes enam-vähem võrdselt: kurbuselausetes 72, rõõmulausetes 67 ja vihalausetes 80. Koma kohal tegi lugeja kõige vähem pause kurbuselausetes -umbes pooltel juhtudel.…”
Section: Uurimisküsimused Ja -Materjalunclassified