In response to the challenges posed by imbalanced failure diagnosis samples, limited labeled data, and significant computational costs in actual industrial production settings, this paper introduces a high-precision, low-resource, end-to-end fault diagnosis framework. On one hand, we propose a data augmentation method based on GCGAN, which combines CNN and GRU to construct core network structures for the generator and discriminator. We integrate a novel Smoothed Hinge-Cross-Entropy loss function to facilitate the training process, effectively mitigating mode collapse and vanishing gradient issues. On the other hand, we design a lightweight fault diagnosis model based on MDSCNN-ICA-BiGRU. By substituting standard convolutions with depthwise separable convolutions on deeper channels, the model complexity is significantly reduced, facilitating effective extraction of multiscale spatial features. The improved Coordinate Attention (CA) mechanism filters out noise and enhances the extraction of high-frequency characteristics. Combined with BiGRU, the model captures global temporal associations, achieving a fusion of spatiotemporal features. Experimental results demonstrate that the proposed approach performs well on both publicly available simulation datasets and private laboratory datasets. Compared to other benchmark methods, the GCGAN module significantly enhances data augmentation, improving classification accuracy on CNNs by 10%. When compared with classic convolutional networks such as DRSN and WDCNN, our MDSCNN-ICA-BiGRU shows faster and more stable convergence rates, with near-100% accuracy on test sets and an average computation cost reduction of approximately 70%. Even in noisy environments, our method maintains high accuracy with a slow rate of precision decay, indicating robustness and generalization capabilities.