People counting in high density crowds is emerging as a new frontier in crowd video surveillance. Crowd counting in high density crowds encounters many challenges, such as severe occlusions, few pixels per head, and large variations in person's head sizes. In this paper, we propose a novel Density Independent and Scale Aware model (DISAM), which works as a head detector and takes into account the scale variations of heads in images. Our model is based on the intuition that head is the only visible part in high density crowds. In order to deal with different scales, unlike off-the-shelf Convolutional Neural Network (CNN) based object detectors which use general object proposals as inputs to CNN, we generate scale aware head proposals based on scale map. Scale aware proposals are then fed to the CNN and it renders a response matrix consisting of probabilities of heads. We then explore non-maximal suppression to get the accurate head positions. We conduct comprehensive experiments on two benchmark datasets and compare the performance with other state-of-theart methods. Our experiments show that the proposed DISAM outperforms the compared methods in both frame-level and pixel-level comparisons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.