Quick and accurate detection of insulator defects from the complex aerial background (such as trees, hillsides, lakes, and buildings) is important work to ensure the safe operation of transmission lines. The existing detection methods have difficulty detecting the defect target due to the strong interference of complex backgrounds in aerial images. To solve this problem, we propose an insulator defect detection model based on a cascaded network. First, we introduce a hierarchical semantic segmentation network to separate the complex background from the target insulator, which is embedded into the main feature extraction branch to form a "segmentation-detection" cascade network to solve the interference problem of complex background when extracting target information; Second, aiming at the problem of direct fusion of conflicting information in different feature layers in the bi-directional path aggregation neck structure in the detection network, we propose an acrossscale feature pyramid with feature refinement structure to enhance the information characteristics of insulator defect targets and improve the multi-scale expression ability of the network. Then, aiming at the problem of difficult samples and imbalance of positive and negative samples in the process of target detection, we propose a focal shape intersection over union loss (focal-SIOU-loss), which improves the precision and stability of the regression process by introducing the weight adjustment mechanism of focal loss and the structural similarity of SIOU Loss. Finally, the experimental results show that, compared with the standard detection models such as YOLOv5, YOLO7, and YOLOv8, the proposed detection model achieves a better performance in the precision, recall rate, and robustness in detecting insulator defects under complex backgrounds.