Summary
The GPGPU paradigm has recently been employed to accelerate the processing of big amounts of data through the utilization of the massive parallelism offered by modern GPUs. To date, several techniques have been proposed for the implementation of simple select, aggregate, and equality join operations on GPUs. In this paper, we study the efficient implementation of theta‐join queries between two relations using the CUDA framework. Theta‐joins are notoriously slow and thus can benefit from massively parallel execution. However, their GPU‐based implementation significantly differs from hash‐ and sort‐based equality joins and needs to be carefully crafted. The implementation is driven by two main objectives. The first relates to the attainment of high efficiency in the parallelization through data reuse, which relates to the minimization of accesses to the slow global memory. The second is about the most efficient exploitation of the available memory given that, in general, it cannot hold the entire input and result. We propose a methodology for processing theta‐joins on a GPU, which exploits the heterogeneous nature of GPGPU, while addressing memory limitations. Furthermore, we provide a series of implementation optimizations, which yield performance improvements of an order of magnitude.