Researchers are becoming more interested in crowd surveillance because of its several potential applications. These applications may include detecting unusual activity for security purposes, monitoring reasons for archiving records, and conducting inventory for facility planning and extension. Detecting people and tracking them from a security viewpoint and understanding their behavior in places large crowds is highly important because unruly crowds in public spaces can lead to serious health and security concerns. Crowd related accidents happen to cause injuries and deaths, which often occur during events not properly planned. The planning of the organizers relies heavily on exploring the behavior of the few in a crowd of individuals and groups in thousands that create the crowds. It is this focus that provides the main reason for this research. This work proposes a model that can count people in crowds, automatically detect and track people, and then estimate their direction and speed. Deep learning networks have proven costly to run, needing memory and power to perform computations beyond what is possible on edge devices with limited resources. As a result, we propose the use of hybrid YOLOv4 consisting of detection method combined with the training phase pruning and the use the convolution attention module strategy. Accuracy of the Hybrid YOLOv4 is increased by 33%, whereas mAP reached 92.1%. While training on the JHU dataset, the suggested hybrid YOLOv4 strategy decreases the computational memory requirements, all of which closely meet the real-time application conditions. This work will help avoid the threatening situation of crowding gathering around to cause stampedes and thus risking crowds with disastrous consequences.