Against the backdrop of rising road traffic accident rates, measures to prevent road traffic accidents have always been a pressing issue in Taiwan. Road traffic accidents are mostly caused by speeding and roadway obstacles, especially in the form of rockfalls, potholes, and car crashes (involving damaged cars and overturned cars). To address this, it was necessary to design a real-time detection system that could detect speed limit signs, rockfalls, potholes, and car crashes, which would alert drivers to make timely decisions in the event of an emergency, thereby preventing secondary car crashes. This system would also be useful for alerting the relevant authorities, enabling a rapid response to the situation. In this study, a hierarchical deep-learning-based object detection model is proposed based on You Only Look Once v7 (YOLOv7) and mask region-based convolutional neural network (Mask R-CNN) algorithms. In the first level, YOLOv7 identifies speed limit signs and rockfalls, potholes, and car crashes. In the second level, Mask R-CNN subdivides the speed limit signs into nine categories (30, 40, 50, 60, 70, 80, 90, 100, and 110 km/h). The images used in this study consisted of screen captures of dashcam footage as well as images obtained from the Tsinghua-Tencent 100K dataset, Google Street View, and Google Images searches. During model training, we employed Gaussian noise and image rotation to simulate poor weather conditions as well as obscured, slanted, or twisted objects. Canny edge detection was used to enhance the contours of the detected objects and accentuate their features. The combined use of these image-processing techniques effectively increased the quantity and variety of images in the training set. During model testing, we evaluated the model’s performance based on its mean average precision (mAP). The experimental results showed that the mAP of our proposed model was 8.6 percentage points higher than that of the YOLOv7 model—a significant improvement in the overall accuracy of the model. In addition, we tested the model using videos showing different scenarios that had not been used in the training process, finding the model to have a rapid response time and a lower overall mean error rate. To summarize, the proposed model is a good candidate for road safety detection.