2018
DOI: 10.1109/tip.2017.2779600
|View full text |Cite
|
Sign up to set email alerts
|

Joint Hand Detection and Rotation Estimation Using CNN

Abstract: Abstract-Hand detection is essential for many hand related tasks, e.g. parsing hand pose, understanding gesture, which are extremely useful for robotics and human-computer interaction. However, hand detection in uncontrolled environments is challenging due to the flexibility of wrist joint and cluttered background. We propose a deep learning based approach which detects hands and calibrates in-plane rotation under supervision at the same time. To guarantee the recall, we propose a context aware proposal genera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
68
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(68 citation statements)
references
References 34 publications
0
68
0
Order By: Relevance
“…MaskRCNN is essentially Hand-CNN without a contextual attention module. We also train a Hand-CNN detector without the semantics Method AP DPM [11] 36.8% ST-CNN [16] 40.6% RCNN [10] 42.3% Context + Skin [22] 48.2% RCNN + Skin [26] 49.5% FasterRCNN [25] 55.7% Rotation Network [7] 58.1% Hand Keypoint [28] 68.6% Hand-CNN (proposed) 78.8% context component and another detector without the similarity context component. As can be seen from Table 3, both types of contextual cues are useful for hand detection.…”
Section: Hand Detection Performancementioning
confidence: 99%
“…MaskRCNN is essentially Hand-CNN without a contextual attention module. We also train a Hand-CNN detector without the semantics Method AP DPM [11] 36.8% ST-CNN [16] 40.6% RCNN [10] 42.3% Context + Skin [22] 48.2% RCNN + Skin [26] 49.5% FasterRCNN [25] 55.7% Rotation Network [7] 58.1% Hand Keypoint [28] 68.6% Hand-CNN (proposed) 78.8% context component and another detector without the similarity context component. As can be seen from Table 3, both types of contextual cues are useful for hand detection.…”
Section: Hand Detection Performancementioning
confidence: 99%
“…Figure 2: Novel and transparent representation of the rotation angle. We use the rotation map to store the rotation angle instead of adding rotation and derotation layers [15] to networks.…”
Section: Introductionmentioning
confidence: 99%
“…We develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers and exploit complementary information. Different from previous methods using additional rotation and derotaion layers (Deng et al 2018), our model generates the rotation map to represent the rotated hand regions effectively. Moreover, we design the multi-scale loss to accelerate the training process by providing supervision to the intermediate layers of the network.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, hands are typically in a rotated pose, and rarely being precisely horizontal or vertical in real scenes. To predict more accurate locations and poses of hands, (Deng et al 2018) design a shared network for learning features, a rotation network to predict the rotation angle of region proposals, a derotation layer to obtain axis-aligned rotating feature maps and a detection network for the last classification task. However, the method is of great complexity to handle the rotated distances, even when carefully designed.…”
Section: Introductionmentioning
confidence: 99%