Philippos Papaphilippou scite author profile

Brooks²,

2018

We have developed a highly-efficient and simple parallel hardware design for merging two sorted lists residing in banked (or multi-ported) memory. The FPGA implementation uses half the hardware resources required for implementing the current state-of-the-art architecture. This is achieved with better performance and half the latency, for the same amount of parallelism. The challenges for the merge operations in FPGAs have been the low clock frequency due to the feedback datapath of the merger being the critical path for timing, and also the high resource utilisation in recent attempts to eliminate/remove the feedback datapath. Our solution uses a modified version of the bitonic merge block, as found in a bitonic sorter, repurposed for performing parallel merge for streaming data. As with the state-of-the-art, it can be considered feedback-less since it only nests one parallel comparison for any desired level of parallelism. This leads to high operating frequency designs, 1.3 times higher than the previous best on our test platform. Since the new design uses 2 times fewer hardware resources, it allows more parallelism and leaves room for additional logic.

An Adaptable High-Throughput FPGA Merge Sorter for Accelerating Database Analytics

Brooks

2020

This work improves on the latest research about sorting acceleration on FPGAs. An efficient design is introduced for sorting data that fit on-chip, with the additional functionality to merge sorted sublists recursively, for an input of arbitrary length. While many-leaf mergers are conventionally single-rate, a novel technique in our approach is to use a parallel merge tree only for the latest stages of the merge tree, to enable bandwidth-adapted multi-rate many-leaf merge. Our open-source RTL generator produces sorting peripherals with customisable parallelism and data format. We evaluate our FPGA design as an 128-bit wide peripheral on an MPSoC platform, with a speedup of up to 49 times over the A53 core for sorting, and up to 27 times speedup for our specialized database analytics application.

Accelerating Database Systems Using FPGAs: A Survey

2018

Database systems are key to a variety of applications, and FPGA-based accelerators have shown promise in supporting high-performance database systems. This survey presents a systematic review of research relating to accelerating analytical database systems using FPGAs. The review includes studies of database acceleration frameworks and accelerator implementations for various database operators. Finally, the survey includes some promising future technologies and discussion on the challenges to be addressed by the future research in this area.

Computer Physics Communications

Accelerating Hybrid Monte Carlo simulations of the Hubbard model on the hexagonal lattice

Krieg

Luu

Ostmeyer

et al. 2019

We present different methods to increase the performance of Hybrid Monte Carlo simulations of the Hubbard model in two-dimensions. Our simulations concentrate on a hexagonal lattice, though can be easily generalized to other lattices. It is found that best results can be achieved using a flexible GMRES solver for matrix inversions and the second order Omelyan integrator with Hasenbusch acceleration on different time scales for molecular dynamics. We demonstrate how an arbitrary number of Hasenbusch mass terms can be included into this geometry and find that the optimal speed depends weakly on the choice of the number of Hasenbusch masses and their values. As such, the tuning of these masses is amenable to automization and we present an algorithm for this tuning that is based on the knowledge of the dependence of solver time and forces on the Hasenbusch masses. We benchmark our algorithms to systems where direct numerical diagonalization is feasible and find excellent agreement. We also simulate systems with hexagonal lattice dimensions up to 102 × 102 and N t = 64. We find that the Hasenbusch algorithm leads to a speed up of more than an order of magnitude.

Accelerating the Merge Phase of Sort-Merge Join

Pirk

2019

We present an efficient, high-throughput and scalable hardware design for accelerating the merge phase of the sort-merge join operation. Sort-merge join is one of the fundamental join algorithms and among the most frequently executed operations in relational databases. It has been the focus of various recent pieces of research, each having different shortcomings and usually only focusing on the sort phase. In this paper, a new parallel sort-merge join architecture is developed, that provides data streaming functionality and high throughput. The key idea of the paper is the use of a novel design for a co-grouping engine, with which the input data are summarised on-the-fly. In this way, the operation is performed on streams of data, preserving the linear access pattern for faster data movement and also eliminates the need for a replay buffer/cache. In contrast to related work, our approach does not make assumptions about the value distribution of the input data and applies to any input size and width. We evaluate the design on a Zynq UltraScale+ based platform, and show that there is up to 3.1 times speedup over the host processor, even without accelerating the sort phase, while still taking into account the data transfers from and to main memory in Linux.