A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

Yin, Shouyi; Ouyang, Peng; Tang, Shibin; Tu, Fengbin; Li, Xiudong; Zheng, Shixuan; Lu, Tianyi; Gu, Jiangyuan; Liu, Leibo; Wei, Shaojun

doi:10.1109/jssc.2017.2778281

Cited by 183 publications

(74 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A more recent chip [74] considers three levels of configurability: The datapath of the computing units, the distribution of external memory bandwidth, and the arithmetic unit where data can be represented with 8-or 16-bits. Processing elements are organized in clusters that can be configured to run different functions.…”

Section: Configurable Architectures For Cnn On Edgementioning

confidence: 99%

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

Véstias

2019

Algorithms

View full text Add to dashboard Cite

The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.

show abstract

Section: Configurable Architectures For Cnn On Edgementioning

confidence: 99%

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

Véstias

2019

Algorithms

View full text Add to dashboard Cite

show abstract

“…The scale of modern CNN implementations also causes issues when attempting to produce a hardware accelerated solution. Previous implementations have resorted to splitting the convolutional layers apart, calculating each one sequentially and then storing the working results in off-chip Dynamic Random-Access Memory (DRAM) for further calculations [22][23][24][25]. Even these measures result in layers that cannot be fitted to a modern FPGA system and in each system a further tiling operation must be performed to break the layers into smaller manageable parts.…”

Section: Discussionmentioning

confidence: 99%

An Analytical Comparison of Locally-Connected Reconfigurable Neural Network Architectures Using a C. elegans Locomotive Model

2018

View full text Add to dashboard Cite

The scale of modern neural networks is growing rapidly, with direct hardware implementations providing significant speed and energy improvements over their software counterparts. However, these hardware implementations frequently assume global connectivity between neurons and thus suffer from communication bottlenecks. Such issues are not found in biological neural networks. It should therefore be possible to develop new architectures to reduce the dependence on global communications by considering the connectivity of biological networks. This paper introduces two reconfigurable locally-connected architectures for implementing biologically inspired neural networks in real time. Both proposed architectures are validated using the segmented locomotive model of the C. elegans, performing a demonstration of forwards, backwards serpentine motion and coiling behaviours. Local connectivity is discovered to offer up to a 17.5× speed improvement over hybrid systems that use combinations of local and global infrastructure. Furthermore, the concept of locality of connections is considered in more detail, highlighting the importance of dimensionality when designing neuromorphic architectures. Convolutional Neural Networks are shown to map poorly to locally connected architectures despite their apparent local structure, and both the locality and dimensionality of new neural processing systems is demonstrated as a critical component for matching the function and efficiency seen in biological networks.

show abstract

“…Unlike the accelerators using NFU, PuDianNao can support additional machine learning algorithms such as k-means, linear regression, multi-layer perceptron (MLP), and support vector machine (SVM). In [29], Yin et al proposed a hybrid-NN processor that is composed of two types of PEs and can support configurable heterogeneous PE arrays. By exploiting the characteristic of data reuse in the conventional CNN models, Chen et al proposed the Eyeriss processor [7] that can optimize the neural network computation for a specific dataflow.…”

Section: Neural Network Acceleratorsmentioning

confidence: 99%

NoC-based DNN accelerator

Chen

Ebrahimi

Wang

et al. 2019

Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip

View full text Add to dashboard Cite

Deep Neural Networks (DNN) have shown significant advantages in many domains such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things era has motivated many kinds of computing platforms to accelerate the DNN operations. The most common platforms are CPU, GPU, ASIC, and FPGA. However, these platforms suffer from low performance (i.e., CPU and GPU), large power consumption (i.e., CPU, GPU, ASIC, and FPGA), or low computational flexibility at runtime (i.e., FPGA and ASIC). In this paper, we suggest the NoC-based DNN platform as a new accelerator design paradigm. The NoC-based designs can reduce the off-chip memory accesses through a flexible interconnect that facilitates data exchange between processing elements on the chip. We first comprehensively investigate conventional platforms and methodologies used in DNN computing. Then we study and analyze different design parameters to implement the NoC-based DNN accelerator. The presented accelerator is based on mesh topology, neuron clustering, random mapping, and XY-routing. The experimental results on LeNet, Mo-bileNet, and VGG-16 models show the benefits of the NoC-based DNN accelerator in reducing off-chip memory accesses and improving runtime computational flexibility.

show abstract

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

Cited by 183 publications

References 17 publications

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

An Analytical Comparison of Locally-Connected Reconfigurable Neural Network Architectures Using a C. elegans Locomotive Model

NoC-based DNN accelerator

Contact Info

Product

Resources

About