Transformer-based methods have achieved impressive performance in image super-resolution (SR). To reduce the computational cost and redundancy of global attention, most transformer-based methods adopt a localized attention mechanism, which diminishes the desirable characteristics of self-attention (SA), such as the effective modeling of long-range dependencies and the ability to capture a global receptive field. To alleviate this problem, we propose a dilated neighborhood attention transformer for image SR (DiNAT-SR) to improve SwinIR for image SR; in it, we replace SA with DiNA to capture more global data and allow the receptive field to grow exponentially. In addition, we also introduce a convolutional modulation block into the transformer to enhance the visual representation and facilitate smoother convergence during training. Our research has, for the first time, confirmed the feasibility of DiNA in the field of image SR. Experimental results have demonstrated the effectiveness of DiNAT-SR with better results compared with SwinIR on most benchmarks in terms of both quantitatively and visually. We also provide a comparison of light-weight image SR models, and our model performs better than SwinIR-light on all benchmarks, with similar total numbers of parameters and floating-point operations. The effectiveness of each introduced component is also validated by an ablation study.