Violence is a serious threat to societal health; preventing violence in airports, airplanes, and spacecraft is crucial. This study proposes the Violence-YOLO model to detect violence accurately in real time in complex environments, enhancing public safety. The model is based on YOLOv9’s Generalized Efficient Layer Aggregation Network (GELAN-C). A multilayer SimAM is incorporated into GELAN’s neck to identify attention regions in the scene. YOLOv9 modules are combined with RepGhostNet and GhostNet. Two modules, RepNCSPELAN4_GB and RepNCSPELAN4_RGB, are innovatively proposed and introduced. The shallow convolution in the backbone is replaced with GhostConv, reducing computational complexity. Additionally, an ultra-lightweight upsampler, Dysample, is introduced to enhance performance and reduce overhead. Finally, Focaler-IoU addresses the neglect of simple and difficult samples, improving training accuracy. The datasets are derived from RWF-2000 and Hockey. Experimental results show that Violence-YOLO outperforms GELAN-C. mAP@0.5 increases by 0.9%, computational load decreases by 12.3%, and model size is reduced by 12.4%, which is significant for embedded hardware such as the Raspberry Pi. Violence-YOLO can be deployed to monitor public places such as airports, effectively handling complex backgrounds and ensuring accurate and fast detection of violent behavior. In addition, we achieved 84.4% mAP on the Pascal VOC dataset, which is a significant reduction in model parameters compared to the previously refined detector. This study offers insights for real-time detection of violent behaviors in public environments.