Existing computer vision-based surface defect detection techniques for metal materials typically encounter issues with defect overlap, significant differences within classes, and similarity between defect samples. These issues compromise feature extraction accuracy and result in missed and false detections. This study proposed a feature optimization-guided high-precision and real-time metal surface defect detection network (FOHR Net) to improve defect feature expressiveness. Firstly, the network presents a multi-layer feature alignment module that enhances the feature information relevant to the target defect by fusing shallow and deep features using a multi-layer feature alignment approach. Secondly, the slice features are reorganized using a dual-branch feature recombination module, and the channel-level soft attention is applied to produce the channel-optimized feature map. The dual-branch transformation stage’s output features are adaptively merged, which may effectively lower feature information loss, improve feature expressiveness, and allow the model to collect useful feature information. Finally, we carried out thorough tests on the NEU-DET, GC10-DET, and APDDD datasets. Our results show that our average mean average precision is superior to other widely used defect detection techniques, with 78.3%, 70.5%, and 65.9%, respectively. Furthermore, we further illustrated the efficacy of our approach using several ablation trials and visualization detection outcomes.