Radix sort stands out for having a better worse case theoretical bound than any comparison-based sort, for fixed length integers. Despite the fact that radix sort can be implemented either in-place or in parallel, there exists no parallel in-place implementation for radix sort that guarantees a sub-linear worst case span. The challenge arises due to read-write races when reading from and writing to the same array. In this thesis, I introduce Regions sort and use it to implement a parallel work-efficient in-place radix sorting algorithm. To sort integers from a range , and a parameter , the algorithm requires only (log log) auxiliary memory, (log) work and ((/ + log) log) span. Regions sort uses a divide-and-conquer paradigm. It first divides the array into sub-arrays, each of which is sorted independently in parallel. This decreases the irregularity in the input and reorders it into a set of regions, each of which has similar properties. Next, it builds a data structure, called the regions graph to represent the needed data movements to completely sort the array. Finally, Regions sort iteratively plans swaps that satisfies the required data movements. The algorithm then has to recurse on records with so-far equal keys to break ties. I compare two variants of Regions sort with the state-of-the-art sorting integer sort and comparison sort algorithms. Experiments show that the best variant of Regions sort usually outperforms other sorting algorithms. In addition, I perform different experiments showing that Regions sort maintains its superiority regardless input distribution and range. More importantly, the single-threaded implementation of Regions sort outperforms the highly optimized Ska Sort implementation for serial in-place radix sort.