Reaching reliable decisions on equipment maintenance is facilitated by the implementation of intelligent fault diagnosis techniques for rotating machineries. Recently, the Transformer model has demonstrated exceptional capabilities in global feature modeling for fault diagnosis tasks, garnering significant attention from the academic community. However, it lacks sufficient prior knowledge regarding rotation invariance, scale, and shift, necessitating pre-training on extensive datasets. In comparison, contemporary convolutional neural networks exhibit greater ease of optimization. This limitation becomes particularly evident when applying the Transformer model in fault diagnosis scenarios with limited data availability. Moreover, the increasing the number of parameters and FLOPs. pose a challenge to its suitability for mobile services due to the limited computational resources available on edge devices. To mitigate these issues, this paper introduces a novel lightweight Transformer (SepFormer) based on separable linear self-attention for fault diagnosis task. The SepFormer performs a novel sequence-level feature embedding to better leverage the inductive bias inherent in the convolutional layers. Furthermore, it integrate a novel separable linear self-attention mechanism into the Transformer architecture, effectively mitigating the computational burden concerns and significantly enhancing the training convergence speed. Extensive experiments are conducted extensively on a bearing fault dataset and gear fault dataset. The experimental results demonstrate that the SepFormer achieves a top-1 accuracy exceeding state-of-the-art approaches by more than 5%, while utilizing the fewest FLOPs. Moreover, the optimizability of SepFormer surpasses that of CNN, ensuring its superior preservation of inductive bias.