Rolling bearings are crucial components of rotating machinery, and their health states directly affect the overall performance of the machinery. Therefore, it is exceedingly necessary to detect and diagnose bearing faults. Numerous bearing fault diagnosis methods have been successfully used for ensuring the safe operation of rotating machinery. However, in practical working environments, there is a considerable amount of noise, resulting in traditional methods incapable of achieving accurate fault diagnosis. This paper proposes a new multi-head attention residual network (MARNet) for rolling bearing fault diagnosis under noisy condition. MARNet optimizes residual units by simplifying multi-layer convolutions into a single-layer convolution and replaces the rectified linear unit (ReLU) function with the exponential linear unit (ELU) function to obtain a more appropriate activation function. Additionally, the multi-head attention mechanism is introduced into the residual block to capture correlation information between any two time sequences, enhancing the network’s feature extraction capability. The effectiveness and superiority of the MARNet in noisy environments are demonstrated through conducting the two bearing datasets from Case Western Reserve University (CWRU) and Paderborn University (PU). The experiment results show that the proposed method exhibits anti-noise characteristics and generalization capability compared with several up-to-date deep learning methods for fault diagnosis of rolling bearings.