Benefiting from the event-driven nature and sparse spiking communication of the brain, spiking neural networks (SNNs) are becoming a promising energy-efficient alternative to traditional artificial neural networks (ANNs). However, the performance gap between SNNs and ANNs has been a significant hindrance to deploying SNNs ubiquitously for a very long period of time. To leverage the full potential of SNNs, we study the effect of attention mechanisms in SNNs, which makes they can focus on important information. We first present our idea of attention in SNNs with a plug-and-play combined module kit, termed the Multi-dimensional Attention (MA) module. Then, a new attention SNN architecture with end-to-end training called "MA-SNN" is proposed, which infers attention weights along the temporal dimension, channel dimension, as well as spatial dimension separately or simultaneously. Based on the existing neuroscience theories, we exploit the attention weights to optimize membrane potentials, which in turn regulate the spiking response in a data-dependent way. At the cost of negligible additional parameters, MA facilitates vanilla SNNs to achieve sparser spiking activity, better performance, and energy efficiency concurrently. Experiments are conducted in event-based DVS128 Gesture/Gait action recognition and ImageNet-1k image classification. On Gesture/Gait, the spike counts are reduced by 84.9%/81.6%, the task accuracy and energy efficiency are improved by 5.9%/4.7% and 3.4×/3.2×. On ImageNet-1K, we achieve top-1 accuracy of 75.92% and 77.08% on single/4-step Res-SNN-104, which are state-of-the-art results in SNNs. Compared with counterpart Res-ANN-104, the performance gap becomes -0.95/+0.21 percent and has 31.8×/7.4× better energy efficiency. To our best knowledge, this is for the first time, that the SNN community achieves comparable or even better performance compared with its ANN counterpart in the large-scale dataset. To analyze and support the effectiveness of MA-SNN, we theoretically prove that the spiking degradation or the gradient vanishing, which usually holds in general SNNs, can be resolved by introducing the block dynamical isometry theory. We also analyze the efficiency of MA-SNN based on our proposed spiking response visualization method. Our work lights up SNN's potential as a general backbone to support various applications in the field of SNN research, with a great balance between effectiveness and efficiency.