2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
DOI: 10.1109/icassp.2006.1660200
|View full text |Cite
|
Sign up to set email alerts
|

Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection

Abstract: We report on machine learning experiments to distinguish deceptive from nondeceptive speech in the Columbia-SRI-Colorado (CSC) corpus. Specifically, we propose a system combination approach using different models and features for deception detection. Scores from an SVM system based on prosodic/lexical features are combined with scores from a Gaussian mixture model system based on acoustic features, resulting in improved accuracy over the individual systems. Finally, we compare results from the prosodic-only SV… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 58 publications
(49 citation statements)
references
References 6 publications
(7 reference statements)
0
49
0
Order By: Relevance
“…Both of them have also been used in previous works on audio-only laughter-versus-speech discrimination (see Table I). The addition of prosodic features to MFCCs, both on decision level and feature level, has been proven to be beneficial for deceptive speech detection [20] and for language identification [67]. In addition, Bachorowski et al [4] found that the mean pitch in both male and female laughter was higher than in modal speech.…”
Section: B Prosodic Featuresmentioning
confidence: 99%
“…Both of them have also been used in previous works on audio-only laughter-versus-speech discrimination (see Table I). The addition of prosodic features to MFCCs, both on decision level and feature level, has been proven to be beneficial for deceptive speech detection [20] and for language identification [67]. In addition, Bachorowski et al [4] found that the mean pitch in both male and female laughter was higher than in modal speech.…”
Section: B Prosodic Featuresmentioning
confidence: 99%
“…Recent work in AI explores methods for the automatic detection of other types of pragmatic variation in text and conversation, such as emotion (Oudeyer, 2002;Liscombe, Venditti, & Hirschberg, 2003), deception (Newman, Pennebaker, Berry, & Richards, 2003;Enos, Benus, Cautin, Graciarena, Hirschberg, & Shriberg, 2006;Graciarena, Shriberg, Stolcke, Enos, Hirschberg, & Kajarekar, 2006;Hirschberg, Benus, Brenier, Enos, Friedman, Gilman, Girand, Graciarena, Kathol, Michaelis, Pellom, Shriberg, & Stolcke, 2005), speaker charisma (Rosenberg & Hirschberg, 2005), mood (Mishne, 2005), dominance in meetings (Rienks & Heylen, 2006), point of view or subjectivity (Wilson, Wiebe, & Hwa, 2004;Wiebe, Wilson, Bruce, Bell, & Martin, 2004;Wiebe & Riloff, 2005;Stoyanov, Cardie, & Wiebe, 2005;Somasundaran, Ruppenhofer, & Wiebe, 2007), and sentiment or opinion (Turney, 2002;Pang & Lee, 2005;Popescu & Etzioni, 2005;Breck, Choi, & Cardie, 2007). In contrast with these pragmatic phenomena, which may be relatively contextualised or short-lived, personality is usually considered to be a longer term, more stable, aspect of individuals (Scherer, 2003).…”
Section: Introductionmentioning
confidence: 99%
“…Localizing and labeling the emotion in the speech data recorded involves a lot of human labor. Despite these factors, an increasing number of researchers is undertaking efforts to acquire emotion data in a natural environment which is also reflected in the growing number of classification studies using natural emotional speech data, see e.g., Fernandez and Picard [62], Vidrascu and Devillers [201], Devillers and Vidrascu [52], Neiberg et al [124], Graciarena et al [67], Truong and van Leeuwen [187], see Table 2.4. As an intermediate, elicitation and Wizard-Of-Oz methods (WOZ) can be used to collect (semi-)spontaneous emotional speech.…”
Section: Data Acquisition and Annotationmentioning
confidence: 99%
“…Stress has been extensively investigated by Zhou et al [219], Fernandez and Picard [62], Kwon et al [99] in the context of car-driving and pilots: in environments where critical situations are likely to occur, stress can be useful to detect. Some 'exotic' emotions such as motherese (i.e., child directed speech) and emphatic (Batliner et al [20], Kwon et al [99]), deceptive speech (Graciarena et al [67]), depressed, suicidal speech (Yingthawornsuk et al [217]), and fatigue, sleepiness in speech (Krajewski and Kröger [96]) have also been addressed. Hotspot detection for meeting summarization and/or meeting browsing (i.e., localization of events with a high level of activity in a meeting) has recently gained interest (Neiberg et al [124], Wrede and Shriberg [211]).…”
Section: Data Acquisition and Annotationmentioning
confidence: 99%