2011
DOI: 10.1109/tmm.2010.2101586
|View full text |Cite
|
Sign up to set email alerts
|

Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help

Abstract: Abstract-Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over single-modal approaches. Both audio and visual channels consist of two streams (cues), facial expressions and head pose for video and cepstral and prosodic features for audio. Two types of experiments were … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
42
1
1

Year Published

2011
2011
2017
2017

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 43 publications
(48 citation statements)
references
References 47 publications
4
42
1
1
Order By: Relevance
“…Inspired by [51], we follow a face-anatomy-driven rather than a simply data-driven approach to identifying the most suitable feature representation of facial shape for the problem at hand. To this end, we visually inspect the deformation pattern associated with each component of the ASM.…”
Section: Featuresmentioning
confidence: 99%
“…Inspired by [51], we follow a face-anatomy-driven rather than a simply data-driven approach to identifying the most suitable feature representation of facial shape for the problem at hand. To this end, we visually inspect the deformation pattern associated with each component of the ASM.…”
Section: Featuresmentioning
confidence: 99%
“…We should also point out that the performance of laughter detection is far from the performance of laughtervs-speech classification based on presegmented episodes where F1 measures close to 90% are achieved [13], [15]. It is also difficult to compare with other approaches, given the completely different datasets and features used.…”
Section: Resultsmentioning
confidence: 96%
“…Cepstral features, such as MFCCs, have been widely used in speech recognition and have also been successfully used for laughter detection [2], [3] and laughter-vs-speech discrimination [13], [15]. Therefore, we use 13 MFCCs in this study as well, which are computed every 10ms over a window of 40ms, i.e.…”
Section: A Audio Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Speech can be considered as an indirect biosignal that is very well suited to unveil the emotional state of a person. Non-speech utterances have also been shown to be of interest for emotion-aware computing [78]; however, they are relatively unexplored. The audio recordings used for speech processing suffer from various types of noise.…”
Section: Ubiquitous Signals Of Emotionmentioning
confidence: 99%