Proceedings of the 2015 ACM on International Conference on Multimodal Interaction 2015
DOI: 10.1145/2818346.2820771
|View full text |Cite
|
Sign up to set email alerts
|

Viewpoint Integration for Hand-Based Recognition of Social Interactions from a First-Person View

Abstract: Wearable devices are becoming part of everyday life, from first-person cameras (GoPro, Google Glass), to smart watches (Apple Watch), to activity trackers (FitBit). These devices are often equipped with advanced sensors that gather data about the wearer and the environment. These sensors enable new ways of recognizing and analyzing the wearer’s everyday personal activities, which could be used for intelligent human-computer interfaces and other applications. We explore one possible application by investigating… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(16 citation statements)
references
References 11 publications
0
16
0
Order By: Relevance
“…The effort of many researchers in finding novel and powerful approaches to obtain better results is justified by the possibility to improve not only the localization accuracy, but also to boost the performance of higher-level inference. In fact, it was demonstrated that a good hand segmentation mask can be sufficient for recognizing actions and activities involving the hands with high accuracy [16], [71]. For this reason, pixellevel segmentation has often been used as basis of higher-inference methods.…”
Section: Remarks On Hand Segmentationmentioning
confidence: 99%
See 2 more Smart Citations
“…The effort of many researchers in finding novel and powerful approaches to obtain better results is justified by the possibility to improve not only the localization accuracy, but also to boost the performance of higher-level inference. In fact, it was demonstrated that a good hand segmentation mask can be sufficient for recognizing actions and activities involving the hands with high accuracy [16], [71]. For this reason, pixellevel segmentation has often been used as basis of higher-inference methods.…”
Section: Remarks On Hand Segmentationmentioning
confidence: 99%
“…Many authors proposed regionbased CNNs to detect the hands, exploiting segmentation approaches (Section 3.1) to generate region proposals. Bambach et al [16], [71] proposed a probabilistic approach for region proposal generation that combined spatial biases (e.g., reasoning on the position of the shape of the hands from training data) and appearance models (e.g., non-parametric modeling of skin color in the YUV color space). To guarantee high coverage, they generated 2,500 regions for each frame that were classified using CaffeNet [75].…”
Section: Hand Detection As Object Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Several recent papers have shown the potential for combining first-person video analysis with evidence from other types of synchronized video, including from other firstperson cameras [3,29], multiple third-person cameras [26], or even hand-mounted cameras [5]. However, these papers assume that a single person appears in each video, avoiding the person-level correspondence problem.…”
Section: Related Workmentioning
confidence: 99%
“…Despite its importance, we are aware of very little work that tries to address this problem. Several recent papers propose using multiple cameras for joint first-person recognition [3,5,26,29], but make simplistic assumptions like that only one person appears in the scene. Using visual SLAM to infer first-person camera trajectory and map to third-person cameras (e.g., [17,19]) works well in some settings, but can fail for crowded environments when longterm precise localizations are needed and when first-person video has significant motion blur.…”
Section: Introductionmentioning
confidence: 99%