Although a large number of works have been done to explore efficient face detection in various scenes, practical face detection in unconstrained condition of varying lighting, pose, scale, and occlusion remains a challenging task. The primary limitation of existing solutions is that they are vulnerable to influence from the wild environment. Practical features extraction plays a crucial part in the face detection of low-quality images. Based on the EfficientNet, this paper builds a novel pyramid attention network to integrate multilevel features with rich context messages. Firstly, a context model is exploited to increase the receptive fields at the beginning of the network. Secondly, stacked pyramid feature attention modules and feature fusion simultaneously selectively integrate the contextual information and enable spatial details, thus enhancing the capacity to detect faces on hard images. In addition, hard samples augmentation of the training sets is conducted, which is beneficial for improving the accuracy. A thorough study on ablation verifies the effect of the proposed strategies. Moreover, extensive experiments on Wider Face and FDDB datasets, remarkably pushing the accuracy both of 96.3% (2%↑), demonstrate the performance of the proposed deep face detector which is superior and outperforms most of the preexisting methods. The method presented in this paper can perform the task of face detection in surveillance images well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.