The increasing number of people is a major cause of disasters that occur due to overcrowding. The gatherings of crowds in public places are a source of panic, which results in disaster. An analytical study was performed on crowd management. This is highly essential for the design of a well-planned public space, the possibility of surveillance in every area, and transportation systems. The disasters that occur due to uncontrollable crowd behaviour involve loss of property, fatalities, or casualties. To avoid this, the crowd’s behaviour was analysed. A MFF (multi-level feature fusion) framework was designed in this paper to predict behaviour. The first level of multi-level feature fusion employs motion and appearance, the second level employs spatial connections, and the third level employs temporal features. The combination of these characteristics aids in the exploitation of crowd behaviour. Furthermore, MFF was evaluated considering the web dataset, considering accuracy, precision, and recall as parameters. Comparative analysis was carried out with various existing methodologies with an accuracy of above 99 %.