A Multithreaded CGRA for Convolutional Neural Network Processing

Ando, Keiichi; Takamaeda-Yamazaki, Shinya; Ikebe, Masayuki; Asai, Tetsuya; Motomura, Masato

doi:10.4236/cs.2017.86010

Cited by 11 publications

(4 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, limited experiments are done to evaluate the performance of EMAX on CNN. In [23], a multithread CGRA (called M-CGRA) are proposed to accelerate CNN only. However, the object inference flow not only contains CNN, but also includes other traditional algorithms.…”

Section: B Application Perspectivementioning

confidence: 99%

“…M-CGRA [23] is a CGRA architecture that is designed for CNN acceleration. For comparison purpose, we list the mapping results of AlexNet on M-CGRA and SDT-CGRA in Table VII.…”

Section: E Comparison With Cgra Implementationsmentioning

confidence: 99%

“…For comparison purpose, we list the mapping results of AlexNet on M-CGRA and SDT-CGRA in Table VII. The power or energy consumption is not available in [23]. From Table VII we can see that SD-CGRA can achieve 13.4 times speedup compared to M-CGRA.…”

Section: E Comparison With Cgra Implementationsmentioning

confidence: 99%

See 2 more Smart Citations

Stream Processing Dual-Track CGRA for Object Inference

Fan

Cao

et al. 2018

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

With the development of machine learning technology, the exploration of energy-efficient and flexible architectures for object inference algorithms is of growing interest in recent years. However, not many publications concentrate on coarsegrained reconfigurable architecture (CGRA) for object inference algorithms. This paper provides a stream processing, dual-track programming CGRA-based approach to address the inherent computing characteristics of algorithms in object inference. Based on the proposed approach, an architecture called SDT-CGRA is presented as an implementation prototype. To evaluate the performance, the SDT-CGRA is realized in Verilog HDL and implemented in SMIC 55nm process, with the footprint of 5.19 mm 2 at 450 MHz. Seven object inference algorithms including CNN, k-means, PCA, SPM, linear-SVM, Softmax and Joint-Bayesian are selected as benchmarks. The experimental results show that the SDT-CGRA can gain on average 343.8 times and 17.7 times higher energy efficiency for Softmax, PCA and CNN, 621.0 times and 1261.8 times higher energy efficiency for k-means, SPM, linear-SVM and Joint-Bayesian algorithms when compared to the Intel Xeon E5-2637 CPU and the Nvidia TitanX GPU. When compared to the state-of-ther-art solutions of AlexNet on FPGA and CGRA, the proposed SDT-CGRA can achieve a 1.78 times increase in energy efficiency and a 13 times speedup respectively.

show abstract

Section: B Application Perspectivementioning

confidence: 99%

“…M-CGRA [23] is a CGRA architecture that is designed for CNN acceleration. For comparison purpose, we list the mapping results of AlexNet on M-CGRA and SDT-CGRA in Table VII.…”

Section: E Comparison With Cgra Implementationsmentioning

confidence: 99%

See 1 more Smart Citation

Stream Processing Dual-Track CGRA for Object Inference

Fan

Cao

et al. 2018

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

show abstract

“…They have been successfully used in domains such as text and speech. However, RNNs are susceptible to overfitting; regularization is important [3]. Motivated by these networks we consider the Tickysim SpiNNaker Model(network) for utilization its topological properties.…”

Section: Introductionmentioning

confidence: 99%

On the Degree-Based Topological Indices of the Tickysim SpiNNaker Model

Imran

Siddiqui

Ahmad

et al. 2018

Axioms

View full text Add to dashboard Cite

Tickysim is a clock tick-based simulator for the inter-chip interconnection network of the SpiNNaker architecture. Network devices such as arbiters, routers, and packet generators store, read, and write forward data through fixed-length FIFO buffers. At each clock tick, every component executes a “read” phase followed by a “write” phase. The structures of any finite graph which represents numerical quantities are known as topological indices. In this paper, we compute degree-based topological indices of the Tickysim SpiNNaker Model ( T S M ) sheet.

show abstract