Thermographic inspection is particularly effective in identifying thermal bridges because it visualizes temperature differences on the building’s surface. The focus of this work is on energy-efficient computing for deep learning-based thermal bridge (anomaly) detection models. In this study, we concentrate on object detection-based models such as Mask R-CNN_FPN_50, Swin-T Transformer, and FSAF. We do benchmark tests on TBRR dataset with varying input sizes. To overcome the energy-efficient design, we apply optimizations such as compression, latency reduction, and pruning to these models. After our proposed improvements, the inference of the anomaly detection model, Mask R-CNN_FPN_50 with compression technique, is approximately 7.5% faster than the original. Also, more acceleration is observed in all models with increasing input size. Another criterion we focus on is total energy gain for optimized models. Swin-T transformer has the most inference energy gains for all input sizes ($$\approx$$
≈
27 J for 3000 x 4000 and $$\approx$$
≈
14 J for 2400 x 3400). In conclusion, our study presents an optimization of size, weight, and power for vision-based anomaly detection for buildings.