Recently, head detection has been widely used in target detection, which has a great application value for improving security prevention and control in public places, as well as enhancing target tracking and identification in national defense, criminal investigation, and other fields. However, detecting small targets accurately at long distances is very difficult, and current methods often lack optimization of multi‐resolution features. Therefore, the authors propose a one‐stage detection network CFNet (cross‐layer feature fusion and fusion weight attention network), in which a fusion weight attention mechanism module (FWAM) is proposed to give different weights to the fused features in order to distinguish the importance of different features. The module increases the weights of features that contain strong information so that the fused features are focused on feature points that are beneficial for optimal head detection. Meanwhile, a cross‐layer feature fusion module is proposed to fuse information from different resolution feature maps to compensate for the decrease in detection accuracy caused by the omission of information features at low resolution, and a connection network for contextual information fusion is constructed, while weight parameter value settings are introduced to optimize the detection effect after fusion of different resolution features. In order to better reflect the effectiveness of the network, the experiments are performed on the SCUT‐HEAD PartA dataset and the Brainwash dataset; the results show that the network the authors proposed is better than the existing comparison methods, which proves the robustness and effectiveness of the network.