When incorporated into intelligent control systems, hardware-based artificial neural networks (HNNs) [1][2][3] present a significant advantage over their software counterparts, especially in terms of speed. The most ubiquitous approach for the implementation of convolutional neural networks (CNNs), consisting of multilayer perceptrons, [4][5][6][7][8] is with software algorithms executed over von Neumann machines. [9,10] These are used in a wide variety of applications such as image classifications. Such algorithms are serial in nature, meaning that a central processing unit (CPU) is used to execute them in a sequential manner. As CNN operations involve a large number of vector-matrix multiplications, application-specific integrated circuits (ASICs) [11,12] were later used to implement parallel HNNs. This significant merit did not come without a cost. The main drawback was, and still is, the complexity of the design flow. In order for the frontend, high-level design to succeed, strict criteria must be met to ensure timing convergence of the backend implementation. This in turn led to field-programmable gate arrays (FPGAs) being configured as HNNs. [13][14][15][16][17] FPGA platforms offer more flexibility when compared with ASICs, at the cost of lower performance and inferior gate count. [18][19][20] Due to these limitations, some FPGA-based HNNs had to rely on low-precision arithmetic implementations, [21][22][23] while ASICs could end up having a very large gate count. As a workaround, several HNNs used offline training schemes using dedicated software. Obviously such dependencies limited their ability to operate in an autonomous fashion or undergo incremental learning while being kept online.In recent years, two approaches were developed for the implementation of bioinspired robotic vision. The first was based on event-triggered sensor arrays, [24][25][26] and the second used specialized software. [27] Using the first approach, spiking cameras were demonstrated as being able to capture fast temporal changes. [24] These apparatuses rely on specialized ASIC implementation with all the associated complexities. As for software algorithms, they were usually developed in accordance with the two visual pathway theory which states that the visual cortex can be split into dorsal and ventral streams. [28] These algorithms were built around a CNN, consisting of a hierarchical feed-forward data processing path, containing two bracketed pairs of convolution operator, followed by a pooling layer. [29] However, they were