Multi-Electrode Arrays and High-Density Multi-Electrode Arrays of sensors are a key instrument in neuroscience research. Such devices are evolving to provide ever-increasing temporal and spatial resolution, paving the way to unprecedented results when it comes to understanding the behaviour of neuronal networks and interacting with them. However, in some experimental cases, in-place lowlatency processing of the sensor data acquired by the arrays is required. This poses the need for highperformance embedded computing platforms capable of processing in real-time the stream of samples produced by the acquisition front-end to extract higher-level information. Previous work has demonstrated that Field-Programmable Gate Array and All-Programmable System-On-Chip devices are suitable target technology for the implementation of real-time processors of High-Density Multi-Electrode Arrays data. However, approaches available in literature can process a limited number of channels or are designed to execute only the first steps of the neural signal processing chain. In this work, we propose an All-Programmable System-On-Chip based implementation capable of sorting neural spikes acquired by the sensors, to associate the shape of each spike to a specific firing neuron. Our system, implemented on a Xilinx Z7020 All-Programmable System-On-Chip is capable of executing on-line spike sorting up to 5500 acquisition channels, 43x more than state-of-the-art alternatives, supporting 18KHz acquisition frequency. We present an experimental study on a commonly used reference dataset, using on-line refinement of the sorting clusters to improve accuracy up to 82%, with only 4% degradation with respect to off-line analysis. INDEX TERMS Field programmable gate arrays, Signal processing, Neural engineering, APSoC, HDMEA, Spike sorting.