Hand detection is a crucial pre-processing procedure for many human hand related computer vision tasks, such as hand pose estimation, hand gesture recognition, human activity analysis, and so on. However, reliably detecting multiple hands from cluttering scenes remains to be a challenging task because of complex appearance diversities of dexterous human hands (e.g., different hand shapes, skin colors, illuminations, orientations, and scales, etc.) in color images. To tackle this problem, an accurate hand detection method is proposed to reliably detect multiple hands from a single color image using a hybrid detection/reconstruction convolutional neural networks (CNN) framework, in which regions of hands are detected and appearances of hands are reconstructed in parallel by sharing features extracted from a region proposal layer, and the proposed model is trained in an end-to-end manner. Furthermore, it is observed that the generative adversarial network (GAN) could further boost the detection performance by generating more realistic hand appearances. The experimental results show that the proposed approach outperforms the state-of-the-art on public challenging hand detection benchmarks.Sensors 2020, 20, 192 2 of 21 and hand-related applications in an unconstrained environment (complex background and the number of hands in an image is unknown) will be an important trend in the future. In this condition, hand detection in an unconstrained environment becomes a new bottleneck in the hand-related works. Thus, the high precision hand detection method will be a crucial step in the pipeline for hand-related applications in unconstrained environment. In this paper, we focus on the hand detection algorithm.Traditional hand detection methods primarily utilize low-level image features such as skin color [10] and shape [11,12] for hand region detection. Nowadays, convolutional neural networks (CNN) based detection approaches [13][14][15][16][17][18][19] are proved to be more robust and accurate [20][21][22] due to the discriminative deep features learned. However, compared to common objects, human hands are highly articulated, appearing in various orientations, scales, shapes, skin colors, and sometimes partial occlusions, therefore reliably detecting multiple human hands from unconstrained cluttering scenes remains to be a challenging problem.To tackle this problem, we propose an approach to accurately detect human hands from single color images by reconstructing the hand appearances. It can also be applied to video clips, as video clips can be considered as sequences of single images. The spirit of our approach is primarily oriented from multitask learning [23] which improves generalization of the network by learning tasks in parallel while using a shared representation. The quality of the hand detection task is closely related to the diversity of hand appearances in terms of hand shape, skin color, orientation, scale, and partial occlusion, etc. As a result, the shared information contained in the training signal of the ha...