Distributed join algorithms on thousands of cores

Barthels, Claude; Müller, Ingo; Schneider, Timo; Alonso, Gustavo; Hoefler, Torsten

doi:10.14778/3055540.3055545

Cited by 66 publications

(41 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The shuffling operator is the only operator that will transmit and receive data over the network. Slow networks can be a bottleneck for parallel database systems [33] and data shuffling has been shown to be a significant contributor to the end-to-end query response time [2,3,37].…”

Section: Data Shuffling In Parallel Database Systemsmentioning

confidence: 99%

“…Still, about 30% of the cycles are idle and would be devoted to other activities in a well-designed database system. Barthels et al [2] Figure 10(b) MESQ/SR vs. IPoIB with 16 nodes in the FDR cluster). As also seen in the repartition pattern, the MESQ/SR algorithm shows good scalability in the FDR cluster while the MQ algorithms degrade.…”

Section: Throughput When Scaling Outmentioning

confidence: 99%

“…We contribute six designs of the data shuffling operator that represent different trade-offs between (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to shuffle data between nodes during query processing. We adopt the popular pull-based operator interface to permit database systems implementors to use the proposed techniques without radically redesigning their existing analytical processing engines.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems

Liu

Yin

Blanas

2017

Proceedings of the Twelfth European Conference on Computer Systems

View full text Add to dashboard Cite

The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This paper considers how to leverage RDMA to improve the analytical performance of parallel database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute six designs that capture salient trade-offs in this design space. We comprehensively evaluate how transport-layer decisions impact the query performance of a database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

show abstract

Section: Data Shuffling In Parallel Database Systemsmentioning

confidence: 99%

Section: Throughput When Scaling Outmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems

Liu

Yin

Blanas

2017

Proceedings of the Twelfth European Conference on Computer Systems

View full text Add to dashboard Cite

show abstract

“…For the case where the result is larger than a private cache, but smaller than the combined shared cache of all threads, Cieslewicz and Ross [11] show that SHAREDAGGREGATION may be a better solution than the other two, which uses uses a shared (lock-free) hash table, at least in the absence of skew. Similar techniques have been proposed for JOIN and SORT operators [4,5,6,7,9]. As we show in this paper, these techniques alone are not sufficient for reproducible floating-point numbers.…”

Section: Related Workmentioning

confidence: 73%

“…As a first solution for reproducible floating-point aggregation with GROUPBY, we propose a data type that can be used as drop-in replacement for intermediate aggregates of floatingpoint numbers in any state-of-the-art aggregation algorithm with little to no modification. 7 We base this type on the…”

Section: A Reproducible Floating-point Typementioning

confidence: 99%

Reproducible Floating-Point Aggregation in RDBMSs

Müller

Arteaga

Hoefler

et al. 2018

2018 IEEE 34th International Conference on Data Engineering (ICDE)

Self Cite

View full text Add to dashboard Cite

Industry-grade database systems are expected to produce the same result if the same query is repeatedly run on the same input. However, the numerous sources of non-determinism in modern systems make reproducible results difficult to achieve. This is particularly true if floating-point numbers are involved, where the order of the operations affects the final result.As part of a larger effort to extend database engines with data representations more suitable for machine learning and scientific applications, in this paper we explore the problem of making relational GROUPBY over floating-point formats bit-reproducible, i.e., ensuring any execution of the operator produces the same result up to every single bit. To that aim, we first propose a numeric data type that can be used as drop-in replacement for other number formats and is-unlike standard floating-point formats-associative. We use this data type to make state-of-theart GROUPBY operators reproducible, but this approach incurs a slowdown between 4 × and 12 × compared to the same operator using conventional database number formats. We thus explore how to modify existing GROUPBY algorithms to make them bitreproducible and efficient. By using vectorized summation on batches and carefully balancing batch size, cache footprint, and preprocessing costs, we are able to reduce the slowdown due to reproducibility to a factor between 1.9 × and 2.4 × of aggregation in isolation and to a mere 2.7 % of end-to-end query performance even on aggregation-intensive queries in MonetDB. We thereby provide a solid basis for supporting more reproducible operations directly in relational engines.This document is an extended version of an article currently in print for the proceedings of ICDE'18 with the same title and by the same authors. The main additions are more implementation details and experiments.

show abstract

Twister2: Design of a big data toolkit

Kamburugamuve

Govindarajan

Wickramasinghe

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Data-driven applications are essential to handle the ever-increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event-driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on-demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event-driven model. The services are increasingly moving to an event-driven model in the form of Function as a Service (FaaS) to compose services. An event-driven runtime designed for data processing consists of well-understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component-based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.

show abstract

Distributed join algorithms on thousands of cores

Cited by 66 publications

References 29 publications

Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems

Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems

Reproducible Floating-Point Aggregation in RDBMSs

Twister2: Design of a big data toolkit

Contact Info

Product

Resources

About