The wide availability of networked sensors such as GPS and cameras is enabling the creation of sensor networks that generate huge amounts of data. For example, vehicular sensor networks where in-car GPS sensor probes are used to model and monitor traffic can generate on the order of gigabytes of data in real time. How can we compress streaming highfrequency data from distributed sensors? In this paper we construct coresets for streaming motion. The coreset of a data set is a small set which approximately represents the original data. Running queries or fitting models on the coreset will yield similar results when applied to the original data set.We present an algorithm for computing a small coreset of a large sensor data set. Surprisingly, the size of the coreset is independent of the size of the original data set. Combining map-and-reduce techniques with our coreset yields a system capable of compressing in parallel a stream of O(n) points using space and update time that is only O(log n). We provide experimental results and compare the algorithm to the popular Douglas-Peucker heuristic for compressing GPS data.
The wide availability of networked sensors such as GPS and cameras is enabling the creation of sensor networks that generate huge amounts of data. For example, vehicular sensor networks where in-car GPS sensor probes are used to model and monitor traffic can generate on the order of gigabytes of data in real time. How can we compress streaming highfrequency data from distributed sensors? In this paper we construct coresets for streaming motion. The coreset of a data set is a small set which approximately represents the original data. Running queries or fitting models on the coreset will yield similar results when applied to the original data set.We present an algorithm for computing a small coreset of a large sensor data set. Surprisingly, the size of the coreset is independent of the size of the original data set. Combining map-and-reduce techniques with our coreset yields a system capable of compressing in parallel a stream of O(n) points using space and update time that is only O(log n). We provide experimental results and compare the algorithm to the popular Douglas-Peucker heuristic for compressing GPS data.
This article describes iDiary, a system that takes as input GPS data streams generated by users’ phones and turns them into textual descriptions of the trajectories. The system features a user interface similar to Google Search that allows users to type text queries on their activities (e.g., “Where did I buy books?”) and receive textual answers based on their GPS signals. iDiary uses novel algorithms for semantic compression and trajectory clustering of massive GPS signals in parallel to compute the critical locations of a user. We encode these problems as follows. The k-segment mean is a k -piecewise linear function that minimizes the regression distance to the signal. The ( k,m )- segment mean has an additional constraint that the projection of the k segments on R d consists of only m ≤ k segments. A coreset for this problem is a smart compression of the input signal that allows computation of a (1+ε)-approximation to its k -segment or ( k,m )-segment mean in O ( n log n ) time for arbitrary constants ε, k , and m . We use coresets to obtain a parallel algorithm that scans the signal in one pass, using space and update time per point that is polynomial in log n . Using an external database, we then map these locations to textual descriptions and activities so that we can apply text mining techniques on the resulting data (e.g., LSA or transportation mode recognition). We provide experimental results for both the system and algorithms and compare them to existing commercial and academic state of the art. This is the first GPS system that enables text-searchable activities from GPS data.
This paper describes a system that takes as input GPS data streams generated by users' phones and creates a searchable database of locations and activities. The system is called iDiary and turns large GPS signals collected from smartphones into textual descriptions of the trajectories. The system features a user interface similar to Google Search that allows users to type text queries on their activities (e.g., "Where did I buy books?") and receive textual answers based on their GPS signals.iDiary uses novel algorithms for semantic compression (known as coresets) and trajectory clustering of massive GPS signals in parallel to compute the critical locations of a user. Using an external database, we then map these locations to textual descriptions and activities so that we can apply text mining techniques on the resulting data (e.g. LSA or transportation mode recognition).We provide experimental results for both the system and algorithms and compare them to existing commercial and academic state-of-the-art. This is the first GPS system that enables text-searchable activities from GPS data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.