In order to study the semantic detection accuracy of 3D vehicle accident video, an accident detection method combining 2D image and 3D information was proposed. The 3D semantic bounding box generated by the 3D detection and tracking task of the vehicle is used to extract the motion features of the vehicle, it includes the trajectory of the vehicle and the dimension and orientation of the 3D bounding frame, and the 3D semantic bounding frame is used to establish the evaluation index of the accident detection. The experimental results show that the average loss function of each batch of 1000 images is calculated by the stochastic gradient descent method to update the parameter values. The learning rate was set to 0.001 in the first 30,000 iterations and 0.0001 in the last 10,000 iterations. The MOTA of the CEM algorithm is 78.4%, FP is 1.1%, and FN is 3.5%, and the MOTA of the 3-DCMK algorithm is 88.6%, FP is 0.9%, and FN is 1.9%. The MOTA of this method is 89.3%, FP is 0.9%, and FN is 1.2%. The 3D target semantic detection of vehicle accident video has stability and accuracy.