2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU) 2007
DOI: 10.1109/asru.2007.4430180
|View full text |Cite
|
Sign up to set email alerts
|

Comparing one and two-stage acoustic modeling in the recognition of emotion in speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
15
0

Year Published

2009
2009
2015
2015

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(16 citation statements)
references
References 3 publications
1
15
0
Order By: Relevance
“…Taking into account current affective cues, responses have to be prepared already before the user has finished speaking and hypotheses have to be generated onthe-fly. It might not make sense at first to detect emotion from speech every few milliseconds, since emotion remains bound to syllables or even words [31]. However, if the output is smoothed over short time periods, reliable estimates of quasi-instantaneous emotions can be given, without the need to identify word boundaries, for example.…”
Section: Introductionmentioning
confidence: 99%
“…Taking into account current affective cues, responses have to be prepared already before the user has finished speaking and hypotheses have to be generated onthe-fly. It might not make sense at first to detect emotion from speech every few milliseconds, since emotion remains bound to syllables or even words [31]. However, if the output is smoothed over short time periods, reliable estimates of quasi-instantaneous emotions can be given, without the need to identify word boundaries, for example.…”
Section: Introductionmentioning
confidence: 99%
“…The term 'segment' will be used in the ongoing referring to a general unit of analysis. Finding the optimal unit of analysis is still an active area of research Schuller et al 2007b;Schuller et al 2008;Busso et al 2007;. As stated in [Zeng et al 2009], the segmentation is one of the most important issues for real applications but has been "largely unexplored so far".…”
Section: The Unit Of Analysismentioning
confidence: 99%
“…Speech and speaker recognition techniques: short-term features and statistical modeling (GMM, HMM) have been successfully combined with a traditional turn based level approach [15]. In [16], a timescale is identified by a the extraction of short-term feature extraction (25 ms windows, MFCC) and the use of statistical modeling (HMM). The time-scale is called by the authors chunk level.…”
Section: Machine Learning Based Unitsmentioning
confidence: 99%
“…''Combining framelevel and segment-based Approach for Intention Recognition in infant-directed Speech'') Section : the idea is to exploit speaker recognition techniques which are mainly based on frame-level modeling (all the frames are exploited for the characterization) as it is done in [16,18]. -data-driven approach (see ''Data-Driven Approach for Time-Scale Feature Extraction'') : speech signals are characterized by prominent segments such as vowels which are then employed as sub-units.…”
Section: Data-fusion Approachmentioning
confidence: 99%
See 1 more Smart Citation