Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems 2015
DOI: 10.1145/2702123.2702601
|View full text |Cite
|
Sign up to set email alerts
|

Joint Estimation of 3D Hand Position and Gestures from Monocular Video for Mobile Interaction

Abstract: We present a machine learning technique to recognize gestures and estimate metric depth of hands for 3D interaction, relying only on monocular RGB video input. We aim to enable spatial interaction with small, body-worn devices where rich 3D input is desired but the usage of conventional depth sensors is prohibitive due to their power consumption and size. We propose a hybrid classification-regression approach to learn and predict a mapping of RGB colors to absolute, metric depth in real time. We also classify … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 15 publications
0
6
0
Order By: Relevance
“…a jazz hand for simple virtual object overlay. Song et al [43] use an RGB camera to recognize hand shapes, estimate the mean hand-camera distance, and use the distance for assorted interactions, e.g., selection. However, the result is still not suffciently precise to resolve the hand-object occlusion.…”
Section: Related Workmentioning
confidence: 99%
“…a jazz hand for simple virtual object overlay. Song et al [43] use an RGB camera to recognize hand shapes, estimate the mean hand-camera distance, and use the distance for assorted interactions, e.g., selection. However, the result is still not suffciently precise to resolve the hand-object occlusion.…”
Section: Related Workmentioning
confidence: 99%
“…Most recent applications of device-free human sensing make use of sensors that detect perturbations of visible light, sound, or radio waves due to certain human activities. Visible light-based approaches may employ either cameras [17,39,40] or a combination of light emitting diodes (LEDs) and photo detectors [19,22,23,49]. Camera-based systems apply computer vision techniques on highresolution images to track human motion or detect certain activities, while shadow patterns are processed in systems that use LEDs and photo detectors.…”
Section: Related Workmentioning
confidence: 99%
“…In comparison, our work builds on less heuristic assumptions while accurately detecting fingertips on and above the interacting surface. Taking inspiration from [30,13,22] we use a combination of machine learning, image processing, and robust estimators to solve the challenging vision problem. Our approach is flexible and can be retrained to fit a wide range of depth sensor positions (e.g., in the device itself), surfaces (e.g., upper arm).…”
Section: Wearable Technologymentioning
confidence: 99%