We present an end-to-end trainable modular event-driven neural architecture that uses local synaptic and threshold adaptation rules to perform transformations between arbitrary spatio-temporal spike patterns. The architecture represents a highly abstracted model of existing Spiking Neural Network (SNN) architectures. The proposed Optimized Deep Event-driven Spiking neural network Architecture (ODESA) can simultaneously learn hierarchical spatio-temporal features at multiple arbitrary time scales. ODESA performs online learning without the use of error back-propagation or the calculation of gradients. Through the use of simple local adaptive selection thresholds at each node, the network rapidly learns to appropriately allocate its neuronal resources at each layer for any given problem without using an error measure. These adaptive selection thresholds are the central feature of ODESA, ensuring network stability and remarkable robustness to noise as well as to the selection of initial system parameters. Network activations are inherently sparse due to a hard Winner-Take-All (WTA) constraint at each layer. We evaluate the architecture on existing spatio-temporal datasets, including the spike-encoded IRIS, latency-coded MNIST, Oxford Spike pattern and TIDIGITS datasets, as well as a novel set of tasks based on International Morse Code that we created. These tests demonstrate the hierarchical spatio-temporal learning capabilities of ODESA. Through these tests, we demonstrate ODESA can optimally solve practical and highly challenging hierarchical spatio-temporal learning tasks with the minimum possible number of computing nodes.
Typically a 1-2MP CCTV camera generates around 7-12GB of data per day. Frame-by-frame processing of such an enormous amount of data requires hefty computational resources. In recent years, compressive sensing approaches have shown impressive compression results by reducing the sampling bandwidth. Different sampling mechanisms were developed to incorporate compressive sensing in image and video acquisition. Though all-CMOS [1, 2] sensor cameras that perform compressive sensing can help save a lot of bandwidth on sampling and minimize the memory required to store videos, the traditional signal processing, and deep learning models can realize operations only on the reconstructed data. To realize the original uncompressed domain, most reconstruction techniques are computationally expensive and time-consuming. To bridge this gap, we propose a novel task of detection and localization of objects directly on the compressed frames. Thereby mitigating the need to reconstruct the frames and reducing the search rate up to 20× (compression rate). We achieved an accuracy of 46.27% mAP with the proposed model on a GeForce GTX 1080 Ti. We were also able to show real-time inference on an NVIDIA TX2 embedded board with 45.11% mAP, thereby achieving the best balance between the accuracy, inference time, and memory constraints.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.