Sparse Spiking Gradient Descent

Perez-Nieves, Nicolas; Goodman, Dan F. M.

doi:10.48550/arxiv.2105.08810

Cited by 3 publications

(3 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kheradpisheh et al use temporal backpropagation (BS4NN) on dense BSNNs (Kheradpisheh et al, 2021). We further consider several additional promising lightweight approaches to training full precision SNNs, including sparse spiking gradient descent (SSGD) which reduces overhead during gradient calculations (Perez-Nieves & Goodman, 2021), and neural heterogeneity (NH) which compresses the required number of neurons with the addition of neuronindependent parameters (Perez-Nieves et al, 2021). Interestingly, BSNNs with threshold annealing can outperform full precision SNNs in terms of accuracy using considerably less memory for model parameters.…”

Section: Comparison With Lightweight Snnsmentioning

confidence: 99%

The fine line between dead neurons and sparsity in binarized spiking neural networks

Eshraghian¹,

Lü²

2022

Preprint

View full text Add to dashboard Cite

Spiking neural networks can compensate for quantization error by encoding information either in the temporal domain, or by processing discretized quantities in hidden states of higher precision. In theory, a wide dynamic range state-space enables multiple binarized inputs to be accumulated together, thus improving the representational capacity of individual neurons. This may be achieved by increasing the firing threshold, but make it too high and sparse spike activity turns into no spike emission. In this paper, we propose the use of 'threshold annealing' as a warm-up method for firing thresholds. We show it enables the propagation of spikes across multiple layers where neurons would otherwise cease to fire, and in doing so, achieve highly competitive results on four diverse datasets, despite using binarized weights. Source code is available at https://github. com/jeshraghian/snn-tha/.

show abstract

Section: Comparison With Lightweight Snnsmentioning

confidence: 99%

The fine line between dead neurons and sparsity in binarized spiking neural networks

Eshraghian¹,

Lü²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent experiments have shown that rate-coded networks (at the output) are robust to sparsity-promoting regularization terms [110], [111], [113]. However, networks that rely on time-to-first-spike schemes have had less success, which is unsurprising given that temporal outputs are already sparse.…”

Section: E Activity Regularizationmentioning

confidence: 99%

Training Spiking Neural Networks Using Lessons From Deep Learning

Eshraghian,

Ward,

Neftci

et al. 2023

Proc. IEEE

224

View full text Add to dashboard Cite

show abstract

“…Using pseudo-derivatives for backpropagating through the non-differential threshold function, as we use for our discrete-time EGRU, was originally proposed for feedforward spiking networks in neuromorphic hardware in [16] and developed further in [3,66]. The sparsity of learning with BPTT when using appropriate pseudo-derivatives in a discrete-time feed-forward spiking neural network was recently described in [50].…”

Section: Related Workmentioning

confidence: 99%

EGRU: Event-based GRU for activity-sparse inference and learning

Subramoney¹,

Nazeer²,

Schöne³

et al. 2022

Preprint

View full text Add to dashboard Cite

The scalability of recurrent neural networks (RNNs) is hindered by the sequential dependence of each time step's computation on the previous time step's output. Therefore, one way to speed up and scale RNNs is to reduce the computation required at each time step independent of model size and task. In this paper, we propose a model that reformulates Gated Recurrent Units (GRU) as an event-based activity-sparse model that we call the Event-based GRU (EGRU), where units compute updates only on receipt of input events (event-based) from other units. When combined with having only a small fraction of the units active at a time (activity-sparse), this model has the potential to be vastly more compute efficient than current RNNs. Notably, activity-sparsity in our model also translates into sparse parameter updates during gradient descent, extending this compute efficiency to the training phase. We show that the EGRU demonstrates competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling while maintaining high activity sparsity naturally during inference and training. This sets the stage for the next generation of recurrent networks that are scalable and more suitable for novel neuromorphic hardware.Preprint. Under review.

show abstract

Sparse Spiking Gradient Descent

Cited by 3 publications

References 35 publications

The fine line between dead neurons and sparsity in binarized spiking neural networks

The fine line between dead neurons and sparsity in binarized spiking neural networks

Training Spiking Neural Networks Using Lessons From Deep Learning

EGRU: Event-based GRU for activity-sparse inference and learning

Contact Info

Product

Resources

About