External stimulation, mood swing, and physiological arousal are closely related and induced by each other. The exploration of internal relations between these three aspects is interesting and significant. Currently, video is the most popular multimedia stimuli that can express rich emotional semantics by its visual and auditory features. Apart from the video features, human electroencephalography (EEG) features can provide useful information for video emotion recognition, as they are the direct and instant authentic feedback on human perception with individuality. In this paper, we collected a total of 39 participants' EEG data induced by watching emotional video clips and built a fusion dataset of EEG and video features. Subsequently, the machine-learning algorithms, including Liblinear, REPTree, XGBoost, MultilayerPerceptron, RandomTree, and RBFNetwork were applied to obtain the optimal model for video emotion recognition based on a multi-modal dataset. We discovered that using the data fusion of all-band EEG power spectrum density features and video audio-visual features can achieve the best recognition results. The video emotion classification accuracy achieves 96.79% for valence (Positive/Negative) and 97.79% for arousal (High/Low).The study shows that this method can be a potential method of video emotion indexing for video information retrieval.