2013
DOI: 10.1007/978-3-642-40303-3_19
|View full text |Cite
|
Sign up to set email alerts
|

Results and Analysis of the ChaLearn Gesture Challenge 2012

Abstract: Abstract. The KinectT M camera has revolutionized the field of computer vision by making available low cost 3D cameras recording both RGB and depth data, using a time of flight infrared sensor. We recorded and made available a large database of 50,000 hand and arm gestures. With these data, we organized a challenge emphasizing the problem of learning from very few examples. The data are split into subtasks, each using a small vocabulary of 8 to 12 gestures, related to a particular application domain: hand sign… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
39
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(39 citation statements)
references
References 14 publications
0
39
0
Order By: Relevance
“…In the case of TV broadcasts the weak supervision is provided by aligned subtitles (that specify a temporal interval where the word may occur), though the supervision is also noisy as the subtitle word may not be signed. In the case of linguistic research datasets (and some gesture datasets [19]) the supervision is often at the video clip level, rather than a tighter temporal interval.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In the case of TV broadcasts the weak supervision is provided by aligned subtitles (that specify a temporal interval where the word may occur), though the supervision is also noisy as the subtitle word may not be signed. In the case of linguistic research datasets (and some gesture datasets [19]) the supervision is often at the video clip level, rather than a tighter temporal interval.…”
Section: Discussionmentioning
confidence: 99%
“…This is inherently expensive and does not scale to large, evolving gesture languages with high levels of variation. As a result, several recent works have attempted to learn gestures at the other extreme -from single training examples using one-shot learning [16,17,19,20,22,24,33]. However, given the vast variability in how gestures are performed, and the variation in people and camera viewpoints, learning accurate, generalizable models with so little supervision is somewhat challenging, to say the least.…”
Section: Introductionmentioning
confidence: 99%
“…HMMs and CRFs are widely applied for sequential gesture recognition [7]. Another approach is to start from static posture recognition and to apply standard methods used for unsequential data, such as SVMs, on the frame level, and subsequently combine the results into the sequence level.…”
Section: Related Workmentioning
confidence: 99%
“…Kinect was used because of its ability to gather color and depth information from video streams. For 2 years, a vast data set (CGD11), of both development and validation batches, was used worldwide as training and testing data in the competition; the results for both years were reported by Guyon et al (2012Guyon et al ( , 2013 with partial success being achieved. Among the results reported, the Levenshtein distance (LD) (Levenshtein, 1966) was between 0.15 and 0.3 (the ideal distance is represented by 0 and the worst by 1).…”
Section: One-shot Learningmentioning
confidence: 99%