There is a gap between recent development of 4K display technologies and the short storage of 4K contents. Super-Resolution (SR) serves as a bridge to harmonize the need and demand. Recently, Convolutional Neural Network (CNN) based networks have demonstrated great property in image SR. However, most existing methods require large model capacity and consume expensive computation for high performance. Besides, most methods keep the upscaling part relatively simple compared with the feature extraction part. For feature fusion, some methods directly concatenate the features of multilevels, which is suboptimal due to ignoring the importance of different features. In this work, we propose a recursive multi-stage upscaling network (RMUN) with multiple subupscaling modules (SUMs) and a discriminative self-ensemble module (SEM). Specifically, we extract local hierarchical features by using a novel feature extraction module (FEM) which is recursive to reduce the number of parameters. Then, we construct multiple sub-upscaling modules to produce various high-resolution features in forward propagation. This strategy enhances the upscaling part and provides multiple error feedback routes. Furthermore, we employ an SEM for global hierarchical feature recalibration, which can selectively emphasize informative features and surpass less useful ones. Extensive quantitative and qualitative evaluations on benchmark datasets show that our proposed method performs comparable with the state-ofthe-art methods in terms of the balance of model size and model performance.