Objective. The OSort algorithm, a pivotal unsupervised spike sorting method, has been implemented in dedicated hardware devices for real-time spike sorting. However, due to the inherent complexity of neural recording environments, OSort still grapples with numerous transient cluster occurrences during the practical sorting process. This leads to substantial memory usage, heavy computational load, and complex hardware architectures, especially in noisy recordings and multi-channel systems. Approach. This study introduces an optimized OSort algorithm (opt-OSort) which utilizes correlation coefficient (CC), instead of Euclidean distance as classification criterion. The CC method not only bolsters the robustness of spike classification amidst the diverse and ever-changing conditions of physiological and recording noise environments, but also can finish the entire sorting procedure within a fixed number of cluster slots, thus preventing a large number of transient clusters. Moreover, the opt-OSort incorporates two configurable validation loops to efficiently reject cluster outliers and track recording variations caused by electrode drifting in real-time. Main Results. The opt-OSort significantly reduces transient cluster occurrences by two orders of magnitude and decreases memory usage by 2.5 to 80 times in the number of pre-allocated transient clusters compared with other hardware implementations of OSort. The opt-OSort maintains an accuracy comparable to offline OSort and other commonly-used algorithms, with a sorting time of 0.68 µs as measured by the hardware-implemented system in both simulated datasets and experimental data. The opt-OSort’s ability to handle variations in neural activity caused by electrode drifting is also demonstrated. Significance. These results present a rapid, precise, and robust spike sorting solution suitable for integration into low-power, portable, closed-loop neural control systems and brain-computer interfaces.