Insulators of the kind used for overhead transmission lines institute important kinds of insulation control, namely, electrical insulation and mechanical fixing. Because of their large exposure to the environment, they are affected by factors such as climate, temperature, durability, the easy occurrence of explosions, damage, the threat of going missing, and other faults. These seriously influence the safety of the power transmission, so insulation monitoring must be conducted. With the development of unmanned technology, the staff used unmanned aircraft to take aerial photos of the detected insulators, and the insulator images were obtained by naked eye observation. Although this method looks very reliable, in practice, due to the large quantity of insulator-collected seismic data, and the complex background, workers are usually relying on their experience to make judgements, so it is easy for mistakes to appear. In recent years, with the rapid development of computer technology, more and more attention has been paid to fault detection and identification in insulators by computer-aided workers. In order to improve the detection accuracy of self-exploding insulators, especially in bad weather environments, and to overcome the influence of fog on target detection, a regression attention convolutional neural network is used for optimization. Through the recursive operation of multi-scale attention, multi-scale feature information is connected in series, the regional focus is recursively generated from coarse to fine, and the target region is detected to achieve optimal results. The experimental results show that the proposed method can effectively improve the fault diagnosis ability of insulators. Compared with the accuracy of other basic models, such as FCAN and MG-CNN, the accuracy of RA-CNN in multi-layer cascade optimization is higher than that in the previous two models, which is 74.9% and 75.6%, respectively. In addition, the results of the ablation experiments at different scales showed that the identification results of different two-level combinations were 78.2%, 81.4%, and 83.6%, and the accuracy of selecting three-level combinations was up to 85.3%, which was significantly higher than the other models.