With the rising incidence of traffic accidents and growing environmental concerns, the demand for advanced systems to ensure traffic and environmental safety has become increasingly urgent. This paper introduces an automated highway safety management framework that integrates computer vision and natural language processing for real-time monitoring, analysis, and reporting of traffic incidents. The system not only identifies accidents but also aids in coordinating emergency responses, such as dispatching ambulances, fire services, and police, while simultaneously managing traffic flow. The approach begins with the creation of a diverse highway accident dataset, combining public datasets with drone and CCTV footage. YOLOv11s is retrained on this dataset to enable real-time detection of critical traffic elements and anomalies, such as collisions and fires. A vision–language model (VLM), Moondream2, is employed to generate detailed scene descriptions, which are further refined by a large language model (LLM), GPT 4-Turbo, to produce concise incident reports and actionable suggestions. These reports are automatically sent to relevant authorities, ensuring prompt and effective response. The system’s effectiveness is validated through the analysis of diverse accident videos and zero-shot simulation testing within the Webots environment. The results highlight the potential of combining drone and CCTV imagery with AI-driven methodologies to improve traffic management and enhance public safety. Future work will include refining detection models, expanding dataset diversity, and deploying the framework in real-world scenarios using live drone and CCTV feeds. This study lays the groundwork for scalable and reliable solutions to address critical traffic safety challenges.