The wide availability of tracking devices has drastically increased the role of geolocation in social networks, resulting in new commercial applications; for example, marketers can identify current trending topics within a region of interest and focus their products accordingly. In this paper we study a basic analytics query on geotagged data, namely: given a spatiotemporal region, find the most frequent terms among the social posts in that region. While there has been prior work on keyword search on spatial data (find the objects nearest to the query point that contain the query keywords), and on group keyword search on spatial data (retrieving groups of objects), our problem is different in that it returns keywords and aggregated frequencies as output, instead of having the keyword as input. Moreover, we differ from works addressing the streamed version of this query in that we operate on large, disk resident data and we provide exact answers. We propose an index structure and algorithms to efficiently answer such top-k spatiotemporal range queries, which we refer as Top-k Frequent Spatiotemporal Terms (kFST) queries. Our index structure employs an R-tree augmented by top-k sorted term lists (STLs), where a key challenge is to balance the size of the index to achieve faster execution and smaller space requirements. We theoretically study and experimentally validate the ideal length of the stored term lists, and perform detailed experiments to evaluate the performance of the proposed methods compared to baselines on real datasets.
Abstract. In this paper, we revisit the much studied problem of Pattern Matching with Swaps (Swap Matching problem, for short). We first present a graph-theoretic model, which opens a new and so far unexplored avenue to solve the problem. Then, using the model, we devise two efficient algorithms to solve the swap matching problem. The resulting algorithms are adaptations of the classic shift-and algorithm. For patterns having length similar to the word-size of the target machine, both the algorithms run in linear time considering a fixed alphabet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.