2013 IEEE International Conference on Systems, Man, and Cybernetics 2013
DOI: 10.1109/smc.2013.730
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations

Abstract: This article presents experiments on automatic detection of laughter and fillers, two of the most important nonverbal behavioral cues observed in spoken conversations. The proposed approach is fully automatic and segments audio recordings captured with mobile phones into four types of interval: laughter, filler, speech and silence. The segmentation methods rely not only on probabilistic sequential models (in particular Hidden Markov Models), but also on Statistical Language Models aimed at estimating the a-pri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
20
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(20 citation statements)
references
References 17 publications
0
20
0
Order By: Relevance
“…We use the SSPNet Vocalization corpus (SVC) (Salamin et al, 2013) for the experiments in this paper. This data was used as the benchmark during the Interspeech challenge and provides a platform for comparison of various algorithmic methods (Kaya et al, 2013; Pammi and Chetouani, 2013; Krikke and Truong, 2013; Brueckner and Schulter, 2014; An et al, 2013) The dataset consists of 2763 audio clips, each 11 seconds long.…”
Section: Databasementioning
confidence: 99%
See 2 more Smart Citations
“…We use the SSPNet Vocalization corpus (SVC) (Salamin et al, 2013) for the experiments in this paper. This data was used as the benchmark during the Interspeech challenge and provides a platform for comparison of various algorithmic methods (Kaya et al, 2013; Pammi and Chetouani, 2013; Krikke and Truong, 2013; Brueckner and Schulter, 2014; An et al, 2013) The dataset consists of 2763 audio clips, each 11 seconds long.…”
Section: Databasementioning
confidence: 99%
“…We list the statistics for laughter and filler events over the entire database in Table 1. For more details on the dataset please refer to (Salamin et al, 2013; Schuller et al, 2013). …”
Section: Databasementioning
confidence: 99%
See 1 more Smart Citation
“…[10,11,12]) is to extract a huge variety of audio-based features, and then perform classification at the utterance level. Notice that no machine learning is done at the frame level; however, in ASR (and in similar tasks such as laughter detection [13,14]) fine-tuned solutions exist on how frames should be classified. Unfortunately, these are usually ignored in computational paralinguistics, and in the notable exceptions when they are not (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…The Social Signals Sub-Challenge of the Interspeech 2013 Computational Paralinguistics Challenge (ComParE) [5] further kindled research activities on laughter and filler detection [6,7,8,9] by providing a baseline database to compare research efforts.…”
Section: Introductionmentioning
confidence: 99%