Arbitrary moving object detection including vehicles and human beings in the real environment, such as protected and sensitive areas, is challenging due to arbitrary deformation and directions caused by Shakey camera and wind. This work aims at adopting a Spatio-temporal approach for classifying arbitrarily moving objects. The proposed method segments foreground objects from the background using the frame difference between the median frame and individual frames. This step outputs several different foreground information. The mean of foreground images is computed, which is referred to as the mean activation map. For the mean activation map, the method employs the Fast Fourier Transform (FFT), which outputs amplitude and frequencies. The mean of frequencies is computed for moving objects in using activation maps of temporal frames, which is considered as a frequency feature vector. The features are normalized to avoid the problems of imbalanced features and class sizes. For classification, the work uses 10-fold cross-validation to choose the number of training and testing samples and the random forest classifier is used for the final classification of arbitrary moving and static videos. For evaluating the proposed method, we construct our dataset, which contains videos of static and arbitrarily moving objects caused by shaky cameras and wind. The results on the video dataset show that the proposed method achieves the state-of-the-art performance.