Most existing feature descriptors for video have limited representation ability. In order to improve the recognition accuracy of method for detecting the videos that include violent scenes and take advantage of the logical structure of video sequences, a novel feature constructing approach based on three dimensional histograms of gradient orientation (HOG3D), the Bag of Visual Words (BoVW) model, and feature pooling technology is proposed. This approach, combined with kernel extreme learning machine (KELM), can be used to detect violent scene. First, the HOG3D feature is extracted on the block level for video, and then the K-Means clustering algorithm is implemented to generate visual words. Then, the bag of visual words framework is used for the quantization of feature. And the feature pooling technology is operated to generate a feature vector for an entire video segment, and feature vectors of training data and testing data were used separately to train the model and evaluate the performance of the proposed approach. The experimental results showed that the proposed feature descriptor had good representation and generalization abilities. The proposed approach is efficient for violent scene detection, and the accuracy matches the best result on Hockey dataset, and it outperforms state-of-the-art on Movies.