Construction sites are dangerous due to the complex interaction of workers with equipment, building materials, vehicles, etc. As a kind of protective gear, hardhats are crucial for the safety of people on construction sites. Therefore, it is necessary for administrators to identify the people that do not wear hardhats and send out alarms to them. As manual inspection is labor-intensive and expensive, it is ideal to handle this issue by a real-time automatic detector. As such, in this paper, we present an end-to-end convolutional neural network to solve the problem of detecting if workers are wearing hardhats. The proposed method focuses on localizing a person’s head and deciding whether they are wearing a hardhat. The MobileNet model is employed as the backbone network, which allows the detector to run in real time. A top-down module is leveraged to enhance the feature-extraction process. Finally, heads with and without hardhats are detected on multi-scale features using a residual-block-based prediction module. Experimental results on a dataset that we have established show that the proposed method could produce an average precision of 87.4%/89.4% at 62 frames per second for detecting people without/with a hardhat worn on the head.