Accurate fast hand detection and gesture recognition for hand understanding are still challenging tasks that are influenced by the diversity of hands and the complexity of the scene in color images. To address the above problem, we propose a novel SqueezeNet and fusion network-based fully convolutional network (SF-FCNet) to accurately and quickly perform hand detection and gesture recognition in color images. First, we introduce the first 17-layer structure in the lightweight SqueezeNet as the hand feature extraction network to accelerate the detection and recognition speed by greatly compressing the network parameters. Second, a precise hand prediction fusion network is designed by adding a residual structure to the deconvolutional network to integrate high-and low-level features of hands, and hand detection and gesture recognition are performed on a single convolutional layer at multiple scales to improve the precision and reduce the computational costs. The verification results on the Oxford hand dataset show that SF-FCNet can reach a precision of 84.1% and a speed of 32 FPS. The experimental results show that SF-FCNet can substantially enhance the precision and speed of hand detection and gesture recognition on three benchmark datasets and has a strong generalization ability on a homemade test set.
In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.