“…Whereas the quicksort is traditional choice for singlethread applications, a number of its variations as well as that of bucket sort, radix sort, sample sort, and many others have been proposed, as they all differ in scalability, computational time, communication complexity, and memory bandwidth. The implementations on a CPU, a multi-core cluster [26], a GPU [46], and even FPGA have been reported. a) Proof of Proposition 7: For our purpose a modification of bucket sort, also called a sample sort, suffices.…”