A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-Hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for streaming graph data. In this paper, we propose the first incremental k-core decomposition algorithms for streaming graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have to be updated, and efficiently process this subgraph to update the k-core decomposition. Our results show a significant reduction in run-time compared to non-incremental alternatives. We show the efficiency of our algorithms on different types of real and synthetic graphs, at different scales. For a graph of 16 million vertices, we observe speedups reaching a million times, relative to the non-incremental algorithms.
The IBM Streams Processing Language (SPL) is the programming language for IBM InfoSphere A Streams, a platform for analyzing Big Data in motion. By BBig Data in motion,[ we mean continuous data streams at high data-transfer rates. InfoSphere Streams processes such data with both high throughput and short response times. To meet these performance demands, it deploys each application on a cluster of commodity servers. SPL abstracts away the complexity of the distributed system, instead exposing a simple graph-of-operators view to the user. SPL has several innovations relative to prior streaming languages. For performance and code reuse, SPL provides a code-generation interface to C++ and Java A. To facilitate writing well-structured and concise applications, SPL provides higher-order composite operators that modularize stream sub-graphs. Finally, to enable static checking while exposing optimization opportunities, SPL provides a strong type system and user-defined operator models. This paper provides a language overview, describes the implementation including optimizations such as fusion, and explains the rationale behind the language design.
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in
Many streaming applications demand continuous processing of live data with little or no downtime, therefore, making high-availability a crucial operational requirement. Fault tolerance techniques are generally expensive and when directly applied to streaming systems with stringent throughput and latency requirements, they might incur a prohibitive performance overhead. This paper describes a flexible, light-weight fault tolerance solution in the context of the SPADE language and the System S distributed stream processing engine. We devised language extensions so users can define and parameterize checkpoint policies easily. This configurable fault tolerance solution is implemented through code generation in SPADE, which reduces the overall application fault tolerance costs by incurring them only for the parts of the application that require it. In this paper we focus on the overall design of our checkpoint mechanism and we also describe an incremental checkpointing algorithm that is suitable for on-thefly processing of high-rate data streams.
This paper describes an experimental methodology used to evaluate the effectiveness of partial fault tolerance (PFT) techniques in data stream processing applications. Without a clear understanding of the impact of faults on the quality of the application output, applying PFT techniques in practice is not viable. We assess the impact of PFT by injecting faults into a synthetic financial engineering application running on top of IBM's stream processing middleware, System S. The application output quality degradation is evaluated via an application-specific output score function. In addition, we propose four metrics that are aimed at assessing the impact of faults in different stream operators of the application flow graph with respect to predictability and availability. These metrics help the developer to decide where in the application he should place redundant resources. We show that PFT is indeed viable, which opens the way for considerably reducing the resource consumption when compared to fully consistent replicas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.