Real-time road quality monitoring, involves using technologies to collect data on the conditions of the road, including information on potholes, cracks, and other defects. This information can help to improve safety for drivers and reduce costs associated with road damage. Traditional methods are time-consuming and expensive, leading to limited spatial coverage and delayed responses to road conditions. With the widespread use of smartphones and ubiquitous computing technologies, data can be collected from built-in sensors of mobile phones and in-vehicle video, on a large scale. This has raised the question of how these data can be used for road pothole detection and has significant practical relevance. Current methods either use acceleration sequence classification techniques, or image recognition techniques based on deep learning. However, accelerometer-based detection has limited coverage and is sensitive to the driving speed, while image recognition-based detection is highly affected by ambient light. To address these issues, this study proposes a method that utilizes the fusion of accelerometer data and in-vehicle video data, which is uploaded by the participating users. The preprocessed accelerometer data and intercepted video frames, were then encoded into real-valued vectors, and projected into the public space. A deep learning-based training approach was used to learn from the public space and identify road anomalies. Spatial density-based clustering was implemented in a multi-vehicle scenario, to improve reliability and optimize detection results. The performance of the model is evaluated with confusion matrix-based classification metrics. Real-world vehicle experiments are carried out, and the results demonstrate that the proposed method can improve accuracy by 6% compared to the traditional method. Consequently, the proposed method provides a novel approach for large-scale pavement anomaly detection.