A lot of the large datasets analyzed today represent graphs. In many real-world applications, summarizing large graphs is beneficial (or necessary) so as to reduce a graph's size and, thus, achieve a number of benefits, including but not limited to 1) significant speed-up for graph algorithms, 2) graph storage space reduction, 3) faster network transmission, 4) improved data privacy, 5) more effective graph visualization, etc. During the summarization process, potentially useful information is removed from the graph (nodes and edges are removed or transformed). Consequently, one important problem with graph summarization is that, although it reduces the size of the input graph, it also adversely affects and reduces its utility. The key question that we pose in this paper is, can we summarize and compress a graph while ensuring that its utility or usefulness does not drop below a certain user-specified utility threshold? We explore this question and propose a novel iterative utilitydriven graph summarization approach. During iterative summarization, we incrementally keep track of the utility of the graph summary. This enables a user to query a graph summary that is conditioned on a user-specified utility value. We present both exhaustive and scalable approaches for implementing our proposed solution. Our experimental results on real-world graph datasets show the effectiveness of our proposed approach. Finally, through multiple real-world applications we demonstrate the practicality of our notion of utility of the computed graph summary.
This paper 1 considers metadata generation and tracking in a collaborative environment where users publish raw sensor data in the form of virtual sensors and post-process data by means of filtering, modeling, or query processing techniques. In the metadata system described, data from different sources with different provenance will be enriched with further metadata at each processing step to describe the processing implemented and/or observations which may explain anomalies in the data. The management of this data is the subject of this paper. In the context of sensor data processing, in particular in the environmental sciences, there is still a large gap between data acquisition and metadata gathering, further complicated by the problem of combining both. In this paper, an attempt is made to bridge the gap between data management and semantic annotation. This paper describes a user friendly, easily deployable system for gathering sensor metadata and capturing semantics behind higher level data processing steps. These semantics are particularly useful in understanding data processing workflows. Furthermore, different methods of querying, exporting and importing gathered data from and to higher level applications are examined.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.