Recently, many video monitoring systems utilize deep learning technologies to recognize locations and trajectories of people in video data. In video monitoring systems, a fast discovery of human groups is an important task for several applications, for example, crime surveillance, contact tracing, and customer behavior analysis. To tackle the demand, we propose a group tracking method. First, we propose a spatial proximity definition and define a novel query type, a group tracking query that considers characteristics of video data. A group tracking query retrieves the groups that travel for more than a certain amount of video frame within a certain distance. We propose an efficient query processing method that exploits the spatio-temporal characteristics of groups. Through extensive experiments using real-world datasets, we verify the efficiency and effectiveness of our query definition and query processing method.INDEX TERMS Spatio-temporal query processing, spatial data management, spatial databases, video query processing, video monitoring systems. FIGURE 2. An example for perspective projection of a circle. When a circle (in Figure 2a) is observed by a camera that has a fixed position, it is projected into an ellipse as in Figure 2b.'search all the video segments with the length of 10 seconds in which two people A and B appear continuously'. For the temporal query processing, they use the modified clusteringand-intersection algorithm to be tolerant to object occlusion. However, their method considers all the objects in a frame as a single group.