“…This leaves parallelism as the only option to allow fast processing for the growing amounts of memory-resident data. The computer architecture community considered two approaches to leverage parallelism, namely (i) off-the-shelf multi-core architectures, including CPUs and GPUs, [2,10] or (ii) customizable architectures such as CPUs with FPGAs [14,18,13,16,21,20]. While multi-cores typically have much higher clock speeds, specialized hardware (e.g., FPGA) has both the advantages of customization (the hardware design is optimized for a specific application) and parallelism.…”