Screen defect detection has become a crucial research domain, propelled by the growing necessity of precise and effective quality control in mobile device production. This study presents the FusionScratchNet (FS-Net), a novel algorithm developed to overcome the challenges of noise interference and to characterize indistinct defects and subtle scratches on mobile phone screens. By integrating the transformer and convolutional neural network (CNN) architectures, FS-Net effectively captures both global and local features, thereby enhancing feature representation. The global–local feature integrator (GLFI) module effectively fuses global and local information through unique channel splitting, feature dependency characterization, and attention mechanisms, thereby enhancing target features and suppressing noise. The bridge attention (BA) module calculates an attention feature map based on the multi-layer fused features, precisely focusing on scratch characteristics and recovering details lost during downsampling. Evaluations using the PKU-Market-Phone dataset demonstrated an overall accuracy of 98.04%, an extended intersection over union (EIoU) of 88.03%, and an F1-score of 65.13%. In comparison to established methods like you only look once (YOLO) and retina network (RetinaNet), FS-Net demonstrated enhanced detection accuracy, computational efficiency, and resilience against noise. The experimental results demonstrated that the proposed method effectively enhances the accuracy of scratch segmentation.