Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, we propose a biologically plausible ASC framework, namely SOM-SNN. This framework uses the unsupervised self-organizing map (SOM) for representing frequency contents embedded within the acoustic signals, followed by an event-based spiking neural network (SNN) for spatiotemporal spiking pattern classification. We report experimental results on the RWCP environmental sound and TIDIGITS spoken digits datasets, which demonstrate competitive classification accuracies over other deep learning and SNN-based models. The SOM-SNN framework is also shown to be highly robust to corrupting noise after multi-condition training, whereby the model is trained with noise-corrupted sound samples. Moreover, we discover the early decision making capability of the proposed framework: an accurate classification can be made with an only partial presentation of the input.
Spiking Neural Networks (SNNs) use spatiotemporal spike patterns to represent and transmit information, which are not only biologically realistic but also suitable for ultralow-power event-driven neuromorphic implementation. Just like other deep learning techniques, Deep Spiking Neural Networks (DeepSNNs) benefit from the deep architecture. However, the training of DeepSNNs is not straightforward because the wellstudied error back-propagation (BP) algorithm is not directly applicable. In this paper, we first establish an understanding as to why error back-propagation does not work well in DeepSNNs. We then propose a simple yet efficient Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons and a Spike-Timing-Dependent Back-Propagation (STDBP) learning algorithm for DeepSNNs where the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner. We show that DeepSNNs trained with the proposed single spike time-based learning algorithm can achieve state-of-the-art classification accuracy. Furthermore, by utilizing the trained model parameters obtained from the proposed STDBP learning algorithm, we demonstrate ultra-lowpower inference operations on a recently proposed neuromorphic inference accelerator. The experimental results also show that the neuromorphic hardware consumes 0.751 mW of the total power consumption and achieves a low latency of 47.71 ms to classify an image from the MNIST dataset. Overall, this work investigates the contribution of spike timing dynamics for information encoding, synaptic plasticity and decision making, providing a new perspective to the design of future DeepSNNs and neuromorphic hardware.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.