Bearings, as widely employed supporting components, frequently work in challenging working conditions, leading to diverse fault types. Traditional methods for diagnosing bearing faults primarily center on time–frequency analysis, but this often requires expert experience for accurate fault identification. Conversely, intelligent fault recognition and classification methods frequently lack interpretability. To address this challenge, this paper introduces a convolutional neural network with an attention mechanism method, denoted as CBAM-CNN, for bearing fault diagnosis. This approach incorporates an attention mechanism, creating a Convolutional Block Attention Module (CBAM), to enhance the fault feature extraction capability of the network in the time–frequency domain. In addition, the proposed method integrates a weight visualization module known as the Gradient-Weighted Class Activation Map (Grad-CAM), enhancing the interpretability of the convolutional neural network by generating visual heatmaps on fault time–frequency graphs. The experimental results demonstrate that utilizing the dataset employed in this study, the CBAM-CNN achieves an accuracy of 99.81%, outperforming the Base-CNN with enhanced convergence speed. Furthermore, the analysis of attention weights reveals that this method exhibits distinct focus of attention under various fault types and degrees. The interpretability experiments indicate that the CBAM module balances the weight allocation, emphasizing signal frequency distribution rather than amplitude distribution. Consequently, this mitigates the impact of the signal amplitude on the diagnostic model to some extent.