Traditional inspection of curtain wall metal hangings usually relies on manual visual inspection, which is costly, slow, and limited in coverage. To reduce cost and improve efficiency and accuracy, a nondestructive automatic inspection system for architectural curtain walls based on millimeter wave imaging was designed. The system is designed as a single-side reflective point-frequency imaging, which can effectively solve the problem of reflective waves generated on the upper and lower surfaces of the curtain wall during millimeter-wave penetration affecting the echo signal. To address the fine-grained classification problem of metal hangings, we propose an efficient channel attention Vision Transformer (ECA-ViT) lightweight classification network based on a hybrid architecture of convolutional neural network and Transformer. Among them, the inverted residual attention module (IRAM) improves the network's attention weight on the image foreground, and the low-rank MobileViT module (LR-ViT) can provide global modeling for the network and making the whole model more lightweight by reducing the computational complexity of the self-attention mechanism. The experimental results demonstrate that the proposed method achieves an accuracy of 95.66% with fewer model parameters and computational complexity, demonstrating good performance advantages.