With the increasing number of edge devices in large-scale edge systems, more and more data are collected to be processed. In such big data scenarios, there is a resurgence of interest in main-memory analytic databases because of the large RAM capacity of modern servers and the increasing demand for real-time analytic platforms. In such databases, join is at the heart of almost every query plan. Join also stays as a time-consuming operation when the denormalization overhead is too large to be applicable. However, the current implementations of these operations have not fully leveraged the new features (eg, SIMD, multi-core) provided by the modern hardware. The goal of this article is to design efficient algorithms for joins by judiciously exploiting every bit of RAM and all the available parallelisms in each processing unit. For join operations, hash joins have been studied, improved, and reexamined over decades.In this article, we propose to utilize a secondary index to improve hash joins without the physical partitioning. Specifically, in the build phase, the hash values are scattered evenly into the logical partitions of the hash table; in the probe phase, the secondary index is used as the hints to re-order the probing sequence, such that the locality of the hash probing is increased. We benchmark the performance of the proposed techniques in our column-store research prototype. Extensive experiments on the synthetic data and the real data show that our methods offer significant performance improvement over their counterparts.