Abstract. The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively kernelizes the graph and performs branching. However, one major drawback of their algorithm is that, for huge graphs, branching still can take exponential time. To avoid this problem, we recursively choose vertices that are likely to be in a large independent set (using an evolutionary approach), then further kernelize the graph. We show that identifying and removing vertices likely to be in large independent sets opens up the reduction space-which not only speeds up the computation of large independent sets drastically, but also enables us to compute high-quality independent sets on much larger instances than previously reported in the literature.
Analyzing massive complex networks yields promising insights about our everyday lives. Building scalable algorithms to do so is a challenging task that requires a careful analysis and an extensive evaluation. However, engineering such algorithms is often hindered by the scarcity of publicly available datasets.Network generators serve as a tool to alleviate this problem by providing synthetic instances with controllable parameters. However, many network generators fail to provide instances on a massive scale due to their sequential nature or resource constraints. Additionally, truly scalable network generators are few and often limited in their realism.In this work, we present novel generators for a variety of network models that are frequently used as benchmarks. By making use of pseudorandomization and divide-and-conquer schemes, our generators follow a communication-free paradigm. The resulting generators are thus embarrassingly parallel and have a near optimal scaling behavior. This allows us to generate instances of up to 2 43 vertices and 2 47 edges in less than 22 minutes on 32 768 cores. Therefore, our generators allow new graph families to be used on an unprecedented scale.
One powerful technique to solve NP-hard optimization problems in practice is branch-and-reduce searchwhich is branch-and-bound that intermixes branching with reductions to decrease the input size. While this technique is known to be very effective in practice for unweighted problems, very little is known for weighted problems, in part due to a lack of known effective reductions. In this work, we develop a full suite of new reductions for the maximum weight independent set problem and provide extensive experiments to show their effectiveness in practice on real-world graphs of up to millions of vertices and edges.Our experiments indicate that our approach is able to outperform existing state-of-the-art algorithms, solving many instances that were previously infeasible. In particular, we show that branch-and-reduce is able to solve a large number of instances up to two orders of magnitude faster than existing (inexact) local search algorithms-and is able to solve the majority of instances within 15 minutes. For those instances remaining infeasible, we show that combining kernelization with local search produces higher-quality solutions than local search alone.
We consider the problem of sampling n numbers from the range {1, . . . , N } without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p + log p) on p processors. The amount of communication between the processors is very small and independent of the sample size. We also discuss modifications needed for load balancing, reservoir sampling, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs.
We present the design and a first performance evaluation of Thrill -a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cachefriendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping).We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.