Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018 calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this Roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The Roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this Roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community.
Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-ofthe-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.
The continuing effect of COVID-19 pulmonary infection has highlighted the importance of machine-aided diagnosis for its initial symptoms such as fever, dry cough, fatigue, and dyspnea. This paper attempts to address the respiratory-related symptoms, using a low power scalable software and hardware framework. We propose CoughNet, a flexible low power CNN-LSTM processor that can take audio recordings as input to detect cough sounds in audio recordings. We analyze the three different publicly available datasets and use those as part of our evaluation to detect cough sound in audio recordings. We perform windowing and hyperparameter optimization on the software side with regard to fitting the network architecture to the hardware system. A scalable hardware prototype is designed to handle different numbers of processing engines and flexible bitwidth using Verilog HDL on Xilinx Kintex-7 160t FPGA. The proposed implementation of hardware has a low power consumption of o 290 mW and energy consumption of 2 mJ which is about 99 × less compared to the state-of-the-art implementation.
In deep convolutional neural networks (DCNNs), model size and computation complexity are two important factors governing throughput and energy efficiency when deployed to hardware for inference. Recent works on compact DCNNs as well as pruning methods are effective, yet with drawbacks. For instance, more than half the size of all MobileNet models lies in their last two layers, mainly because compact separable convolution (CONV) layers are not applicable to their last fullyconnected (FC) layers. Also, in pruning methods the compression is gained at the expense of irregularity in the DCNN architecture, which necessitates additional indexing memory to address non-zero weights, thereby increasing memory footprint, decompression delays, and energy consumption. In this paper, we propose cyclic sparsely connected (CSC) architectures, with memory/computation complexity of O(N log N) where N is the number of nodes/channels given a DCNN layer, that, contrary to compact depthwise separable layers, can be used as an overlay for both FC and CONV layers of O(N 2 ). Also, contrary to pruning methods, CSC architectures are structurally sparse and require no indexing due to their cyclic nature. We show that both standard convolution and depthwise convolution layers are special cases of the CSC layers and whose mathematical function, along with FC layers, can be unified into one single formulation, and whose hardware implementation can be carried out under one arithmetic logic component. We examine the efficacy of the CSC architectures for compression of LeNet, AlexNet, and MobileNet DCNNs with precision ranging from 2 to 32 bits. More specifically, we surge upon the compact 8-bit quantized 0.5 MobileNet V1 and show that by compressing its last two layers with CSC architectures, the model is compressed by ∼ 1.5× with a size of only 873 KB and little accuracy loss. Lastly, we design a configurable hardware that implements all types of DCNN layers including FC, CONV, depthwise, CSC-FC, and CSC-CONV indistinguishably within a unified pipeline. We implement the hardware on a tiny Xilinx FPGA for total on-chip processing of the compressed MobileNet that, compared to the related work, has the highest Inference/J while utilizing the smallest FPGA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.