Sorting has been one of the most challenging studied problems in different scientific researches. Although many techniques and algorithms have been proposed on the theory of having efficient parallel sorting implementation, however achieving desired performance on different types of the architectures with large number of processors is still a challenging issue. Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalance and waiting time due to memory latencies. In this paper, we present a distributed sorting algorithm implemented in PGX.D, a fast distributed graph processing system, which outperforms the Spark's distributed sorting implementation by around 2x-3x by hiding communication latencies and minimizing unnecessary overheads. Furthermore, it shows that the proposed PGX.D sorting method handles dataset containing many duplicated data entries efficiently and always results in keeping balanced workloads for different input data distribution types.Index Terms-Distributed sorting method, PGX.D distributed graph framework, Graph. * This work was done during the author's internship at Oracle Labs.In this research, we propose a new distributed sorting method, which overcomes these challenges by keeping balanced load and minimizes the overheads by fetching data efficiently in the partitioning and merging steps. The new handler is proposed that results in having a balanced merging while parallelizing merging steps, which improves the parallel performance. Moreover, the new investigator is proposed that results in keeping a balanced workloads among the distributed processors while dealing with dataset containing many duplicated data entries. This method is implemented in PGX.D, which is a scalable framework for various distributed implementations. PGX.D [7], [8] is a fast, parallel and distributed graph analytic framework that is able to process large graphs in distributed environments while keeping workloads well balanced among distributed machines. It improves the performance of the proposed sorting technique by exposing programming model that intrinsically reduces poor utilization of the resources by maintaining balanced workloads, minimizes latencies by managing parallel tasks efficiently and provides asynchronous task execution for sending/receiving data to/from the remote processors. The results presented in [7] show that PGX.D has low overhead and a bandwidth efficient communication framework, which easily supports remote data pulling patterns and is about 3x-90x faster than the other distributed graph systems such as GraphLab. Moreover, PGX.D decreases communication overheads by delaying unnecessary computations until the end of the current step, which allows the other processes to be continued without waiting for the completion of all the previous computations. Also it allows having asynchronous local and remote requests that avoids unnecessary synchronization barriers that helps in increasing scalability of the distributed sorting method [9].In this paper, we s...