As a basic algorithm for big data processing, external sorting suffers from massive read and write operations in the external memory. Recent works separate part of the data processing work from the host side to the solid state drive (SSD) to reduce data transmission. However, the internal memory of the SSD is limited, and undesirable data retention could occur during the merge phase. Therefore, to improve the efficiency of memory, we propose an algorithm named ISort. Specifically, we build an index table between the memory and the address. The index table determines the order of pages being read in the merge phase according to their minimum values, which are read into memory sequentially to reduce the data residing in memory and improve memory efficiency. Since the merge phase is performed inside the SSD, ISort can take advantage of the high IO bandwidth within the SSD to speed up the execution of the merge phase. We search for the optimal ratio of read and write channels by comparing the “specialized channel” and the “hybrid channel” for data of read and write performance because the utilization of the channel will directly influence performance. Experimental results show that ISort can maintain better data processing speed when SSD memory is limited, outperforming other robust algorithms. In addition, the algorithm’s performance using the crossover strategy is better than that using the specialization strategy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.