Recently the crowd counting has received more and more attention. Especially the technology of high-density environment has become an important research content, and the relevant methods for the existence of extremely dense crowd are not optimal. In this paper, we propose a multi-level attentive Convolutional Neural Network (MLAttnCNN) for crowd counting. We extract high-level contextual information with multiple different scales applied in pooling, and use multilevel attention modules to enrich the characteristics at different layers to achieve more efficient multi-scale feature fusion, which is able to be used to generate a more accurate density map with dilated convolutions and a 1 × 1 convolution. The extensive experiments on three available public datasets show that our proposed network achieves outperformance to the state-of-the-art approaches.