The conventional deep learning-based bearing fault diagnosis method tend to utilize denoising modules to improve the fault diagnosis performance in noisy scenes. However, the addition of denoising modules will increase expensive computational costs, leading to a delayed acquisition of fault diagnosis results. This work proposed a lightweight batch normalization-free residual network without any denoising modules for bearing fault diagnosis which properly rescaled the weights in a standard initialization instead of batch normalization to avoid the exploding gradient problem and vanishing gradient problem at the beginning of training for deep neural networks. Therefore, it prevents the undesirable properties caused by batch normalization. Compared with other methods, the fault diagnosis performance of the proposed method can maintain a high level with different input sizes and batch sizes. Especially in noisy scenes, the testing accuracy of fault diagnosis on different bearing datasets can be improved by 13.54% and 7.74% using fewer parameters and FLOPs on different bearing datasets.