Recently, deep-learning-based image super-resolution methods have made remarkable progress. However, most of these methods do not fully exploit the structural feature of the input image, as well as the intermediate features from the intermediate layers, which hinders the ability of detail recovery. To deal with this issue, we propose a gradient-guided and multi-scale feature network for image super-resolution (GFSR). Specifically, a dual-branch structure network is proposed, including the trunk branch and the gradient one, where the latter is used to extract the gradient feature map as structural prior to guide the image reconstruction process. Then, to absorb features from different layers, two effective multi-scale feature extraction modules, namely residual of residual inception block (RRIB) and residual of residual receptive field block (RRRFB), are proposed and embedded in different network layers. In our RRIB and RRRFB structures, an adaptive weighted residual feature fusion block (RFFB) is investigated to fuse the intermediate features to generate more beneficial representations, and an adaptive channel attention block (ACAB) is introduced to effectively explore the dependencies between channel features to further boost the feature representation capacity. Experimental results on several benchmark datasets demonstrate that our method achieves superior performance against state-of-the-art methods in terms of both subjective visual quality and objective quantitative metrics.