Jasmina Malicevic scite author profile

et al. 2015

159

Chaos scales graph processing from secondary storage to multiple machines in a cluster. Earlier systems that process graphs from secondary storage are restricted to a single machine, and therefore limited by the bandwidth and capacity of the storage system on a single machine. Chaos is limited only by the aggregate bandwidth and capacity of all storage devices in the entire cluster.Chaos builds on the streaming partitions introduced by X-Stream in order to achieve sequential access to storage, but parallelizes the execution of streaming partitions. Chaos is novel in three ways. First, Chaos partitions for sequential storage access, rather than for locality and load balance, resulting in much lower pre-processing times. Second, Chaos distributes graph data uniformly randomly across the cluster and does not attempt to achieve locality, based on the observation that in a small cluster network bandwidth far outstrips storage bandwidth. Third, Chaos uses work stealing to allow multiple machines to work on a single partition, thereby achieving load balance at runtime.In terms of performance scaling, on 32 machines Chaos takes on average only 1.61 times longer to process a graph 32 times larger than on a single machine. In terms of capacity scaling, Chaos is capable of handling a graph with 1 trillion edges representing 16 TB of input data, a new milestone for graph processing capacity on a small commodity cluster.

Exploiting NVM in large-scale graph analytics

Dulloor

Sundaram

et al. 2015

Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, however, is not able to match the increasing demands for capacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth.Using an NVM hardware emulator, we study the suitability of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their performance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating memory exclusive from DRAM (DRAM-only) or emulated NVM (NVM-only).While all of these applications are sensitive to higher latency or lower bandwidth of NVM, resulting in performance degradation of up to 4⇥ with NVM-only (compared to DRAM-only), we show that the performance impact is somewhat mitigated in the frameworks that exploit CPU memory-level parallelism and hardware prefetchers.Further, we demonstrate that, in a hybrid memory system with NVM and DRAM, intelligent placement of application data based on their relative importance may help offset the overheads of the NVM-only solution in a cost-effective manner (i.e., using only a small amount of DRAM). Specifically, we show that, depending on the algorithm, Graphmat can achieve close to DRAM-only performance (within 1.2⇥) by

Scale-up graph processing in the cloud

Roy

Zwaenepoel

2014

Processing large graphs is an important part of the big-data problem. Recently a number of scale-up systems such as X-Stream, Graphchi and Turbograph have been proposed for processing large graphs using secondary storage on a single machine. The design and evaluation of these systems however have focused on physical machines. We expect that a natural evolution of such systems is to the cloud where a virtual machine would run the graph processing algorithm and access the graph from secondary storage remotely connected through the network. We evaluate a state of the art graph processing system called X-Stream in EC2 to identify challenges in this space. Our primary finding is that the network bandwidth between a virtual machine and remote storage becomes the limiter for performance. We show that this bottleneck can be somewhat alleviated through the use of VM local instance storage, network provisioning and compression.

Rock you like a hurricane

Bindschaedler

Schiper

et al. 2018

Current cluster computing frameworks suffer from load imbalance and limited parallelism due to skewed data distributions, processing times, and machine speeds. We observe that the underlying cause for these issues in current systems is that they partition work statically. Hurricane is a high-performance large-scale data analytics system that successfully tames skew in novel ways. Hurricane performs adaptive work partitioning based on load observed by nodes at runtime. Overloaded nodes can spawn clones of their tasks at any point during their execution, with each clone processing a subset of the original data. This allows the system to adapt to load imbalance and dynamically adjust task parallelism to gracefully handle skew. We support this design by spreading data across all nodes and allowing nodes to retrieve data in a decentralized way. The result is that Hurricane automatically balances load across tasks, ensuring fast completion times. We evaluate Hurricane's performance on typical analytics workloads and show that it significantly outperforms stateof-the-art systems for both uniform and skewed datasets, because it ensures good CPU and storage utilization in all cases. CCS CONCEPTS • Information systems; • Applied computing; • Computer systems organization → Architectures; Dependable and fault-tolerant systems and networks; • Networks; * Nicolas Schiper was with EPFL when this work was performed.

Efficient large-scale graph processing: optimisations for storage, performance and evolving graphs

Malicevic¹

2019