Internet of Things (IoT) ecosystem in smart cities demands fast, reliable, and efficient image data transmission to enable real-time Computer Vision (CV) applications. To fulfill these demands, an Orthogonal Frequency Division Multiplexing (OFDM)-based communication system has been widely utilized due to its higher spectral efficiency and data rate. When adapting such a system to achieve fast and reliable image transmission over fading channels, noise is introduced in the signal which heavily distorts the recovered image. This noise independently corrupts pixel values, however, certain intrinsic properties of the image, such as spatial information, may remain intact, which can be extracted as multidimensional features (in the convolution layers) and interpreted (in the top layers) by a Deep Learning (DL) model. Therefore, the current study analyzes the robustness of such DL models utilizing various OFDM-based image communication systems for CV applications in an Intelligent Transportation Systems (ITS) environment. Our analysis has shown that the EfficientNetV2-based model achieved a range of 70–90% accuracy across different OFDM-based image communication systems over the Rayleigh Fading channel. In addition, leveraging different data augmentation techniques further improves accuracy up to 18%.