This research presents a robust approach for automated safety diagnosis using sensor fusion techniques. This work fuses the outputs of a roadside low-resolution camera and a solid-state LiDAR. For vehicle classification and detection in videos, the YOLO-v5 object detection model was utilized. The raw 3D point clouds generated by the LiDAR are processed by two manual steps—ground plane transformation and background segmentation, and two real-time steps—foreground clustering and bounding box fitting. Taking the generated 2D bounding boxes of both camera and LiDAR, we associate the common bounding box pairs by thresholding on the Euclidean distance threshold of 6 ft between the centroid pairs. We perform weighted measurement updates based on the root mean square error of each of the sensor’s detection compared with manually labeled ground truths. The fused measurements are tracked by using linear constant velocity Kalman filter. With the generated trajectories, we compute post encroachment time at pixel-level conflicts based on the generated vehicle trajectories. We have proposed a complete bipartite graph-matching strategy of vehicle parts along with the conflict angle to obtain conflict types—rear-end, sideswipe, head-on, and angle conflict. A case study on a signalized intersection is presented. The output of the proposed framework performs with 97.384% precision and 95.316% recall. It is better than both single-sensor-based systems in relation to detection count and localization. It is expected that the proposed method can be employed to diagnose road safety problems and inform the required countermeasures.