Crowd counting aims to count the number of people in crowded scenes, which is important to the security systems, traffic control and so on. The existing methods typically using local features cannot properly handle the perspective distortion and the varying scales in congested scene images, and henceforth perform wrong people counting. To alleviate this issue, this study proposes a multi-scale residual feature-aware network (MSR-FAN) that combines multi-scale features using multiple receptive field sizes and learns the featureaware information on each image. The MSR-FAN is trained end-to-end to generate highquality density map and evaluate the crowd number. The method consists of three parts. To handle the perspective changes problem, the first part, the direction-based featureenhanced network, is designed to encode the perspective information in four directions based on the initial image feature. The second part, the proposed multi-scale residual block module, gets the global information to handle the represent the regional feature better. This module explores features of different scales as well as reinforce the global feature. The third part, the feature-aware block, is designed to extract the feature hidden in the different channels. Experiment results based on benchmark datasets show that the proposed approach outperforms the existing state-of-the-art methods.