In recent times, video event detection gained high attention in the researcher’s community, because of its widespread applications. In this paper, a new model is proposed for detecting different human actions in the video sequences. First, the videos are acquired from the University of Central Florida (UCF) 101, Human Motion Database (HMDB) 51 and Columbia Consumer Video (CCV) datasets. In addition, the DenseNet201 model is implemented for extracting deep feature values from the acquired datasets. Further, the Improved Gray Wolf Optimization (IGWO) algorithm is developed for selecting active/relevant feature values that effectively improve the computational time and system complexity. In the IGWO, leader enhancement and competitive strategies are employed to improve the convergence rate and to prevent the algorithm from falling into the local optima. Finally, the Bi-directional Long Short Term Memory (BiLSTM) network is used for event classification (101 action types in UCF101, 51 action types in HMDB51, and 20 action types in CCV). In the resulting phase, the IGWO-based BiLSTM network achieved 94.73%, 96.53%, and 93.91% accuracy on the UCF101, HMDB51, and CCV datasets.