This study puts forth a methodology to discern structural damage in bridges that employs two-dimensional convolutional neural network (2D-CNN), which is rooted in the principles of continuous wavelet transform (CWT) theory. The method combines the vehicle–bridge coupled vibration response with deep learning models to extend the application of indirect bridge damage identification methods. To test the proposed method, a spatial vehicle and bridge computational model is established for a three-span continuous beam bridge, and bridge damage is simulated by reducing the stiffness of the unit under different damage conditions. Considering the stochastic nature of road roughness, a self-developed vehicle-bridge coupled vibration analysis program is utilized to acquire the vehicle acceleration response signal and construct the dataset. The 2D-CNN model, with its high sensitivity to two-dimensional data features, is used to extract features from the vehicle vertical acceleration vibration signal. The signal undergoes transformation via CWT, resulting in a 2D grayscale time-frequency image. This image is subsequently utilised as input to construct the 2D-CNN model. Results demonstrate that this method performs well in the identification of bridge structural damage, exhibiting high accuracy in identifying the location and severity of such damage. Thus, a novel avenue is provided for the identification and assessment of bridge structural damage.