Ensuring safety while driving relies heavily on normal driving behavior, making the timely detection of dangerous driving patterns crucial. In this paper, an Hourglass Attention ResNet Network (HAR-Net) is proposed to detect dangerous driving behavior. Uniquely, we separately input optical flow data, RGB data, and RGBD data into the network for spatial–temporal fusion. In the spatial fusion part, we combine ResNet-50 and the hourglass network as the backbone of CenterNet. To improve the accuracy, we add the attention mechanism to the network and integrate center loss into the original Softmax loss. Additionally, a dangerous driving behavior dataset is constructed to evaluate the proposed model. Through ablation and comparative studies, we demonstrate the efficacy of each HAR-Net component. Notably, HAR-Net achieves a mean average precision of 98.84% on our dataset, surpassing other state-of-the-art networks for detecting distracted driving behaviors.