2012
DOI: 10.1007/978-3-642-31479-7_14
|View full text |Cite
|
Sign up to set email alerts
|

Look at Who’s Talking: Voice Activity Detection by Automated Gesture Analysis

Abstract: Abstract. This paper proposes an approach for Voice Activity Detection (VAD) based on the automatic measurement of gesturing. The main motivation of the work is that gestures have been shown to be tightly correlated with speech, hence they can be considered a reliable evidence that a person is talking. The use of gestures rather than speech for performing VAD can be helpful in many situation (e.g., surveillance and monitoring in public spaces) where speech cannot be obtained for technical, legal or ethical iss… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
5
2
2

Relationship

4
5

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 14 publications
0
15
0
Order By: Relevance
“…In particular, domains like Affective Computing [31] and Social Signal Processing [43] adopted nonverbal behavioral cues as a physical, machine detectable evidence of emotional and social phenomena, respectively. Research efforts targeted a wide spectrum of problems, including conflict detection [28], communication dynamics [7,25], mimicry measurement [10], early detection of developmental and cognitive diseases [37], role recognition [38], prediction of negotiation outcomes [9], videosurveillance [4,5,6,8], etc. Furthermore, several works were dedicated to the automatic prediction of traits likely to be relevant in a teaching context like, in particular, personality [21,23,30] and dominance [13,27,34,35].…”
Section: Computingmentioning
confidence: 99%
“…In particular, domains like Affective Computing [31] and Social Signal Processing [43] adopted nonverbal behavioral cues as a physical, machine detectable evidence of emotional and social phenomena, respectively. Research efforts targeted a wide spectrum of problems, including conflict detection [28], communication dynamics [7,25], mimicry measurement [10], early detection of developmental and cognitive diseases [37], role recognition [38], prediction of negotiation outcomes [9], videosurveillance [4,5,6,8], etc. Furthermore, several works were dedicated to the automatic prediction of traits likely to be relevant in a teaching context like, in particular, personality [21,23,30] and dominance [13,27,34,35].…”
Section: Computingmentioning
confidence: 99%
“…However, audio-related cues can not be extracted reliably in airports. Cristani et al (2012) use body behavior and gestures to classify a video of four participants having a conversation into intervals of speech or non-speech. The method achieves 72% accuracy, but the setting is static.…”
Section: Spokesperson Detectionmentioning
confidence: 99%
“…In [49], gesturing is used to infer who is talking when in a surveillance scenario, realizing through statistical analysis a simple form of diarization (detection of who speaks when, [50]). Actually, cognitive scientists showed that speech and gestures are so tightly intertwined that every important investigation of language has taken gestures into account [51].…”
Section: Gesture and Posturementioning
confidence: 99%