Recently, malicious video classification and filtering techniques are of practical interest as ones can easily access to malicious multimedia contents through the Internet, IPTV, online social network, and etc. Considerable research efforts have been made to developing malicious video classification and filtering systems. However, the malicious video classification and filtering is not still being from mature in terms of reliable classification/filtering performance. In particular, the most of conventional approaches have been limited to using only the spatial features (such as a ratio of skin regions and bag of visual words) for the purpose of malicious image classification. Hence, previous approaches have been restricted to achieving acceptable classification and filtering performance. In order to overcome the aforementioned limitation, we propose new malicious video classification framework that takes advantage of using both the spatial and temporal features that are readily extracted from a sequence of video frames. In particular, we develop the effective temporal features based on the motion periodicity feature and temporal correlation. In addition, to exploit the best data fusion approach aiming to combine the spatial and temporal features, the representative data fusion approaches are applied to the proposed framework. To demonstrate the effectiveness of our method, we collect 200 sexual intercourse videos and 200 non-sexual intercourse videos. Experimental results show that the proposed method increases 3.75% (from 92.25% to 96%) for classification of sexual intercourse video in terms of accuracy. Further, based on our experimental results, feature-level fusion approach (for fusing spatial and temporal features) is found to achieve the best classification accuracy.