Evaluation of spiking neural networks requires fetching a large number of synaptic weights to update postsynaptic neurons. This limits parallelism and becomes a bottleneck for hardware.We present an approach for spike propagation based on a probabilistic interpretation of weights, thus reducing memory accesses and updates. We study the effects of introducing randomness into the spike processing, and show on benchmark networks that this can be done with minimal impact on the recognition accuracy.We present an architecture and the trade-offs in accuracy on fully connected and convolutional networks for the MNIST and CIFAR10 datasets on the Xilinx Zynq platform.
IntroductionSpiking neural networks are often referred to as the third generation of neural network models [10], and have several characteristics that make them attractive from the viewpoint of hardware design. They follow an event-driven model of computation, where the work done (hence energy consumed) can be made proportional to the number of spike events, and do not require the arrays of multiply-accumulate (MAC) operations that characterize conventional artificial neural networks (including convolutional networks, referred to here as ANNs) This makes ANNs well-suited to parallel implementation on architectures such as GPUs.In contrast, spiking neural networks may require other types of computations to determine whether a neuron is to fire or not. Hardware architectures for spiking networks (eg. [2, 7, 4, 11]) therefore differ considerably from those for regular ANNs, and focus more on features that enable efficient event-driven computation. This usually requires the network to be trained specifically for the target architecture (due to restrictions on permitted connections or weights), and it is not efficient to take a network trained for one architecture and directly run it on another.Despite being event driven, spiking networks still require a large number of memory accesses, primarily for two purposes [11]: determining the recipient neurons of a spike, and fetching weights of the corresponding synapses. Recent data (eg. [8]) indicates that fetching data from memory (especially off-chip) is much more expensive in energy than arithmetic computations. For reasonably sized networks, the neuron weight and index information becomes too much to store on-chip -the resulting off-chip memory accesses thus end up dominating the energy consumed for the computation, and can also lead to increased latency.