Among unmanned surface vehicle (USV) components, underwater thrusters are pivotal in their mission execution integrity. Yet, these thrusters directly interact with marine environments, making them perpetually susceptible to malfunctions. To diagnose thruster faults, a non-invasive and cost-effective vibration-based methodology that does not require altering existing systems is employed. However, the vibration data collected within the hull is influenced by propeller-fluid interactions, hull damping, and structural resonant frequencies, resulting in noise and unpredictability. Furthermore, to differentiate faults not only at fixed rotational speeds but also over the entire range of a thruster’s rotational speeds, traditional frequency analysis based on the Fourier transform cannot be utilized. Hence, Continuous Wavelet Transform (CWT), known for attributions encapsulating physical characteristics in both time-frequency domain nuances, was applied to address these complications and transform vibration data into a scalogram. CWT results are diagnosed using a Vision Transformer (ViT) classifier known for its global context awareness in image processing. The effectiveness of this diagnosis approach was verified through experiments using a USV designed for field experiments. Seven cases with different fault types and severity were diagnosed and yielded average accuracy of 0.9855 and 0.9908 at different vibration points, respectively.