Traditional detect and count strategy can’t well handle the extremely crowded footage in computer vision-based counting task. In recent years, deep learning approaches have been widely explored to tackle this challenge. By regressing visual features to density map, the total crowd number can be predicted while avoids the detection of their actual positions. Efforts of improving performance distribute at various phases of the detecting pipeline, such as feature extraction and eliminating deviation of regressed density map etc. In this article, we conduct a thorough review on the most representative and state-of-the-art techniques. The efforts are systematically categorized into three topics: the evolving of front-end network, the handling of unbalanced density map prediction, and the selection of loss function. After the evaluation of most significant techniques, innovations of the state-of-the-art are inspected in detail to analyze specific reasons to achieve high performances. As conclusion, possible directions of enhancement are discussed to provide insights of future research.