Joint Hand Detection and Rotation Estimation Using CNN

Deng, Xiaoming; Zhang, Yinda; Yang, Shuo; Tan, Ping; Chang, Liang; Yuan, Ye; Wang, Hongan

doi:10.1109/tip.2017.2779600

Cited by 85 publications

(68 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MaskRCNN is essentially Hand-CNN without a contextual attention module. We also train a Hand-CNN detector without the semantics Method AP DPM [11] 36.8% ST-CNN [16] 40.6% RCNN [10] 42.3% Context + Skin [22] 48.2% RCNN + Skin [26] 49.5% FasterRCNN [25] 55.7% Rotation Network [7] 58.1% Hand Keypoint [28] 68.6% Hand-CNN (proposed) 78.8% context component and another detector without the similarity context component. As can be seen from Table 3, both types of contextual cues are useful for hand detection.…”

Section: Hand Detection Performancementioning

confidence: 99%

Contextual Attention for Hand Detection in the Wild

Narasimhaswamy

Wei

Wang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. Hand-CNN extends MaskRCNN with a novel attention mechanism to incorporate contextual cues in the detection process. This attention mechanism can be implemented as an efficient network module that captures non-local dependencies between features. This network module can be inserted at different stages of an object detection network, and the entire detector can be trained end-to-end.We also introduce a large-scale annotated hand dataset containing hands in unconstrained images for training and evaluation. We show that Hand-CNN outperforms existing methods on several datasets, including our hand detection benchmark and the publicly available PASCAL VOC human layout challenge. We also conduct ablation studies on hand detection to show the effectiveness of the proposed contextual attention module.

show abstract

Section: Hand Detection Performancementioning

confidence: 99%

Contextual Attention for Hand Detection in the Wild

Narasimhaswamy

Wei

Wang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…Figure 2: Novel and transparent representation of the rotation angle. We use the rotation map to store the rotation angle instead of adding rotation and derotation layers [15] to networks.…”

Section: Introductionmentioning

confidence: 99%

Towards interpretable and robust hand detection via pixel-wise prediction

Liú

Zhang

Luo

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

The lack of interpretability of existing CNN-based hand detection methods makes it difficult to understand the rationale behind their predictions. In this paper, we propose a novel neural network model, which introduces interpretability into hand detection for the first time. The main improvements include: (1) Detect hands at pixel level to explain what pixels are the basis for its decision and improve transparency of the model. (2) The explainable Highlight Feature Fusion block highlights distinctive features among multiple layers and learns discriminative ones to gain robust performance. (3) We introduce a transparent representation, the rotation map, to learn rotation features instead of complex and non-transparent rotation and derotation layers. (4) Auxiliary supervision accelerates the training process, which saves more than 10 hours in our experiments. Experimental results on the VIVA and Oxford hand detection and tracking datasets show competitive accuracy of our method compared with state-of-the-art methods with higher speed. (Libo Zhang)1 Dan Liu and Tiejian Luo were contributed equally and should be considered as co-first authors. Models and code are available at https://isrc.iscas.ac.cn/gitlab/research/pr2020phdn.

show abstract

“…We develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers and exploit complementary information. Different from previous methods using additional rotation and derotaion layers (Deng et al 2018), our model generates the rotation map to represent the rotated hand regions effectively. Moreover, we design the multi-scale loss to accelerate the training process by providing supervision to the intermediate layers of the network.…”

Section: Introductionmentioning

confidence: 99%

“…On the other hand, hands are typically in a rotated pose, and rarely being precisely horizontal or vertical in real scenes. To predict more accurate locations and poses of hands, (Deng et al 2018) design a shared network for learning features, a rotation network to predict the rotation angle of region proposals, a derotation layer to obtain axis-aligned rotating feature maps and a detection network for the last classification task. However, the method is of great complexity to handle the rotated distances, even when carefully designed.…”

Section: Introductionmentioning

confidence: 99%

Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently

Liú

Zhang

et al. 2019

AAAI

View full text Add to dashboard Cite

Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i.e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection. In this paper, we propose a new Scale Invariant Fully Convolutional Network (SIFCN) trained in an end-to-end fashion to detect hands efficiently. Specifically, we merge the feature maps from high to low layers in an iterative way, which handles different scales of hands better with less time overhead comparing to concatenating them simply. Moreover, we develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers to achieve scale invariance. To deal with rotated hand detection, we present the rotation map to get rid of complex rotation and derotation layers. Besides, we design the multi-scale loss scheme to accelerate the training process significantly by adding supervision to the intermediate layers of the network. Compared with the state-of-the-art methods, our algorithm shows comparable accuracy and runs a 4.23 times faster speed on the VIVA dataset and achieves better average precision on Oxford hand detection dataset at a speed of 62.5 fps. * Corresponding author: Libo Zhang (libo@iscas.ac.cn).

show abstract

Joint Hand Detection and Rotation Estimation Using CNN

Cited by 85 publications

References 34 publications

Contextual Attention for Hand Detection in the Wild

Contextual Attention for Hand Detection in the Wild

Towards interpretable and robust hand detection via pixel-wise prediction

Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently

Contact Info

Product

Resources

About