Thermal infrared imaging could detect hidden faults in various types of high-voltage power equipment, which is of great significance for power inspections. However, there are still certain issues with thermal-imaging-based abnormal heating detection methods due to varying appearances of abnormal regions and complex temperature interference from backgrounds. To solve these problems, a contour-based instance segmentation network is first proposed to utilize thermal (T) and visual (RGB) images, realizing high-accuracy segmentation against complex and changing environments. Specifically, modality-specific features are encoded via two-stream backbones and fused in spatial, channel, and frequency domains. In this way, modality differences are well handled, and effective complementary information is extracted for object detection and contour initialization. The transformer decoder is further utilized to explore the long-range relationships between contour points with background points, and to achieve the deformation of contour points. Then, the auto-encoder-based reconstruction network is developed to learn the distribution of power equipment using the proposed random argument strategy. Meanwhile, the UNet-like discriminative network directly explores the differences between the reconstructed and original image, capturing the deviation of poor reconstruction regions for abnormal heating detection. Many images are acquired in transformer substations with different weathers and day times to build the datasets with pixel-level annotation. Several extensive experiments are conducted for qualitative and quantitative evaluation, while the comparison results fully prove the effectiveness and robustness of the proposed instance segmentation method. The practicality and performance of the proposed abnormal heating detection method are evaluated on image patches with different kinds of insulators.