The abundance of powered semiconductor devices has increased with the introduction of renewable energy sources into the grid, causing power quality disturbances (PQDs). This represents a huge challenge for grid reliability and smart city infrastructures. Accurate detection and classification are important for grid reliability and consumers’ appliances in a smart city environment. Conventionally, power quality monitoring relies on trivial machine learning classifiers or signal processing methods. However, recent advancements have introduced Deep Convolution Neural Networks (DCNNs) as promising methods for the detection and classification of PQDs. These techniques have the potential to demonstrate high classification accuracy, making them a more appropriate choice for real-time operations in a smart city framework. This paper presents a voting ensemble approach to classify sixteen PQDs, using the DCNN architecture through transfer learning. In this process, continuous wavelet transform (CWT) is employed to convert one-dimensional (1-D) PQD signals into time–frequency images. Four pre-trained DCNN architectures, i.e., Residual Network-50 (ResNet-50), Visual Geometry Group-16 (VGG-16), AlexNet and SqeezeNet are trained and implemented in MATLAB, using images of four datasets, i.e., without noise, 20 dB noise, 30 dB noise and random noise. Additionally, we also tested the performance of ResNet-50 with a squeeze-and-excitation (SE) mechanism. It was observed that ResNet-50 with the SE mechanism has a better classification accuracy; however, it causes computational overheads. The classification performance is enhanced by using the voting ensemble model. The results indicate that the proposed scheme improved the accuracy (99.98%), precision (99.97%), recall (99.80%) and F1-score (99.85%). As an outcome of this work, it is demonstrated that ResNet-50 with the SE mechanism is a viable choice as a single classification model, while an ensemble approach further increases the generalized performance for PQD classification.