Modern brain-computer interfaces and neural implants allow interaction between the tissue, the user and the environment, where people suffer from neurodegenerative diseases or injuries. This interaction can be achieved by using penetrating/invasive microelectrodes for extracellular recordings and stimulation, such as Utah or Michigan arrays. The application-specific signal processing of the extracellular recording enables the detection of interactions and enables user interaction. For example, it allows to read out movement intentions from recordings of brain signals for controlling a prosthesis or an exoskeleton. To enable this, computationally complex algorithms are used in research that cannot be executed onchip or on embedded systems. Therefore, an optimization of the end-to-end processing pipeline, from the signal condition on the electrode array over the analog pre-processing to spike-sorting and finally the neural decoding process, is necessary for hardware
inference in order to enable a local signal processing in real-time and to enable a compact system for achieving a high comfort level. is This paper presents a survey of system architectures and algorithms for end-to-end signal processing pipelines of neural activity on the hardware of such neural devices, including (i) on-chip signal pre-processing, (ii) spike-sorting on-chip or on embedded hardware and (iii) neural decoding on workstations. A particular focus for the hardware implementation is on low-power electronic design and artifact-robust algorithms with low computational effort and very short latency. For this, current challenges and possible solutions with support of novel machine learning techniques are presented in brief. In addition, we describe our future vision for next-generation BCIs.