2019
DOI: 10.1109/mm.2019.2908101
|View full text |Cite
|
Sign up to set email alerts
|

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Abstract: This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache.Our experim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 31 publications
0
8
0
Order By: Relevance
“…to 4 OoO cores. The area overhead of the L1 in-cache computing unit is 0.5% of the core area using the estimates by Eckert et al (2018) and Wikichip (2016), which is not significant. Hence, it will not be considered for the rest of the paper.…”
Section: Architectural Explorationmentioning
confidence: 78%
See 1 more Smart Citation
“…to 4 OoO cores. The area overhead of the L1 in-cache computing unit is 0.5% of the core area using the estimates by Eckert et al (2018) and Wikichip (2016), which is not significant. Hence, it will not be considered for the rest of the paper.…”
Section: Architectural Explorationmentioning
confidence: 78%
“…In-cache computing allows massive Single Instruction Multiple Data (SIMD)-like operations to be performed in the cache hierarchy as proposed by Jeloka et al (2016). In our work, we use an in-cache computing architecture similar to BLADE proposed by , targeted for the L1 cache of ARM-based many-core systems, as opposed to the Last Level Cache (LLC), as in NeuralCache proposed by Eckert et al (2018). Regarding HBM proposed by Lee et al (2014), emerging memory architectures have been explored, but mainly for GPUs, as discussed in Chatterjee et al (2017).…”
Section: Related Workmentioning
confidence: 99%
“…A fair comparison to [3] is, however, difficult as it considers a complete system-PPAC would need to be integrated into a system for a fair comparison. We note, however, that if the method in [3] is used to compute MVPs, an element-wise multiplication between two vectors whose entries are L-bit requires L 2 + 5L − 2 clock cycles [4], which is a total of 34 clock cycles for 4-bit numbers. Then, the reduction (via sum) of an N -dimensional vector with L-bits per entry requires O(L log 2 (N )) clock cycles, which is at least 64 clock cycles for a 256-dimensional vector with 8-bit entries (as the product of two 4-bit numbers results in 8-bit).…”
Section: B Comparison With Existing Acceleratorsmentioning
confidence: 99%
“…Hence, an inner product between two 4-bit vectors with 256 entries requires at least 98 clock cycles-PPAC requires only 16 clock cycles for the same operation. This significant difference in the number of clock cycles is caused by the fact that the design in [4] is geared towards data-centric applications in which element-wise operations are performed between high-dimensional vectors to increase parallelism. PPAC aims at accelerating a wide range of MVP-like operations, which is why we included dedicated hardware (such as the row pop-count) to speed up element-wise vector multiplication and vector sum-reduction.…”
Section: B Comparison With Existing Acceleratorsmentioning
confidence: 99%
“…Near data processing is applicable not only in traditional memory, such as SRAM [4,5] and DRAM [2,[6][7][8][9][10], but also in emerging memory, such as PCM [11], STT-MRAM [12], and ReRAM [13]. There are also various attempts to reduce the data movement overhead by computation offloading to storage devices [3,[14][15][16].…”
Section: Introductionmentioning
confidence: 99%