In this thesis, we address major challenges in spatiotemporal data mining. Spatiotemporal data are data that relate to both space and time. Spatiotemporal data mining is the process of discovering patterns and extracting knowledge from spatiotemporal data. Examples of spatiotemporal data are trajectories of moving vehicles (i.e., GPS equipped buses or taxis) that move around the city reporting their GPS location along with the timestamp of each measurement, or animals' trajectories that are particularly important for zoologists in order to analyze the ecological behavior. The ubiquitous mobile devices generate massive datasets of GPS trajectories that are particularly difficult to interpret or analyse. Examples of spatiotemporal data mining include the extraction of moving patterns among a collection of moving objects. Another example of trajectory data mining, focusing in urban environments, is the traffic modelling. This refers to the analysis of the real-time traffic reports in order to extract the current traffic condition across the road network and the forecasting of the future traffic condition. Detecting and forecasting the traffic condition is particularly important in order to perform travel time estimation of a given path. Also, traditional centralized systems are unable to cope with the current massive-scale of spatiotemporal data, mainly because a single host is not able to receive and process all incoming data in real-time. Due to this reason, there is a shift in using Distributed Stream Processing Systems (DSPS).The contributions in this thesis focus on three main topic areas: (i) discovering corridors from GPS trajectories, (ii) hybrid travel time estimation and (iii) distributed complex event processing. Bellow we provide a short description of each of these topics:
Discovering corridors from GPS trajectoriesWe aim at detecting paths that are frequently followed by the moving objects in the spatiotemporal database. Our main hypothesis is that even if the moving objects have different origins and destinations they are following common paths. The detections of such common paths is particularly demanding due to the complex nature of the spatiotemporal data. In this thesis we proposed a pipelined algorithm that discovers such frequent paths. The proposed technique first discovers areas that are frequently followed together by the moving objects and then discovers frequent paths following a greedy algorithm.
Hybrid travel time estimationThis thesis also, focuses on the analysis of traffic condition across the road network and more specifically with the travel time estimation problem from one location of the road network to another given the path that will be followed and the time of departure. This problem is particularly challenging since the travel time depends on multiple factors that are difficult to be modeled. In this thesis we proposed a hybrid travel time estimation technique, named HTTE. Our technique performs travel time estimation for a given path (map-matched in the road network), using da...