Some common problems, including the effect of non-detection regions on accuracy, the small size and multi-scale of defects,and the challenge of automatically optimizing neural network hyperparameters, are confronted during the metal gear end-face defect detection, lead to the inadequate performance of accuracy and efficiency, making them unsuitable for meeting the real-time online detection demands in industries. To address the problems above, this study proposes a method SF-YOLONet to detect defects on metal gear end faces by using the Optimized Evolutionary Algorithm. Firstly, a testing platform was constructed to detect surface defects on metal gear end-faces. Subsequently, in order to address the impact of non-detection regions on accuracy, this study introduces the SF algorithm, a visual saliency-based image extraction method, to eliminates interference between ineffective features in non-detection regions and edge burrs. Additionally, A network (YOLONet) for detecting end-face defects in metal gears is introduced, which integrates the CBAM module and BiFPN feature extraction strategy. These advancements enhance adaptive learning and feature extraction for small-sized defects on gear end-face, and combine low-resolution and deep-level semantic information, to detect small and multi-scale defects is enhanced. Finally, the ISSA algorithm is introduced to optimize the hyperparameters of the SF-YOLONet model, thereby avoiding the instability of manual parameterization. The experiment demonstrated that the SF-YOLONet model achieved an average precision of 98.01% and an F1 score of 0.99 on the Metal Gear end-face defect testing dataset. The average computation time for detection per image on the YOLONet model was 0.13 seconds. Compared to other deep learning models, the proposed SF-YOLONet model significantly improves precision and efficiency in detecting defects on gear end-face. This enhancement effectively meets the real-time online detection requirements of industries.