Recently, Wireless Video Sensor Networks (WVSNs) have been one of the most used technologies for surveillance, event tracking, nature catastrophe and other sudden events. Those networks are composed of small embedded camera motes which help to extract the needed information for the monitored zone of interest. A WVSN is divided into 3 different layers: the video sensor-node layer, the coordinator layer and the sink. Every video sensor-node is in charge of capturing the raw data of images and videos and sending it to the coordinator for further analysis before sending the analyzed data to the sink. In a normal scenario, the load of collected images and videos from different sensor nodes on the same network is huge. Sending all the images from all the sensor nodes to the coordinator consumes a lot of energy on every sensor, and may cause a bottleneck. In this paper, some processing and analysis are added based on the similarity between frames on the sensor-node level to send only the important frames to the coordinator. Kinematic functions are defined to predict the next step of the intrusion and to schedule the monitoring system accordingly. Compared to a fully scheduling approach based on predictions, this approach minimizes the transmission on the network. Thus, it reduces the energy consumption and the possibility of any bottleneck while guaranteeing the detection of all the critical events at the sensornode level as shown in the experiments.