2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2017
DOI: 10.1109/hpca.2017.21
|View full text |Cite
|
Sign up to set email alerts
|

Compute Caches

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
229
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 264 publications
(233 citation statements)
references
References 22 publications
4
229
0
Order By: Relevance
“…An obvious limitation to the approach is that it does not offer a systematic way to locate all the possible places in which CiM-enabled macros could be inserted, which inevitably underestimate the benefits of CiM. Different from most current works, [22] explores CiM in three levels of SRAM cache hierarchy and completes the control flow inside cache in the absence of data locality. However, its limitation is the same as [19], which requires customized benchmarks for data locality.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…An obvious limitation to the approach is that it does not offer a systematic way to locate all the possible places in which CiM-enabled macros could be inserted, which inevitably underestimate the benefits of CiM. Different from most current works, [22] explores CiM in three levels of SRAM cache hierarchy and completes the control flow inside cache in the absence of data locality. However, its limitation is the same as [19], which requires customized benchmarks for data locality.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works (e.g., [4,15,16,17,18,19,20,21,22,23,24,25]) in both CMOS Static Random Access Memory (SRAM) and emerging non-volatile memories (NVMs) have demonstrated various CiM designs at different levels of memory hierarchy. The designs allow computation to occur exactly where data resides, thereby reducing energy and performance overheads associated with data movement.…”
Section: Introductionmentioning
confidence: 99%
“…While the hardware designs in [10], [19], [24] are specialized to carry out 1-bit MVPs and the designs in [6], [23] to execute multi-bit MVPs for neural network inference, PPAC is programmable to perform not only these operations, but also GF(2) MVPs, Hamming-similarity computations, and PLA or CAM functionality, opening up its use in a wide range of applications. In this sense, PPAC is similar to the work in [3], where PIM is used to accelerate multiple applications, such as database query processing, cryptographic kernels, and in-memory checkpointing. A fair comparison to [3] is, however, difficult as it considers a complete system-PPAC would need to be integrated into a system for a fair comparison.…”
Section: B Comparison With Existing Acceleratorsmentioning
confidence: 99%
“…In this sense, PPAC is similar to the work in [3], where PIM is used to accelerate multiple applications, such as database query processing, cryptographic kernels, and in-memory checkpointing. A fair comparison to [3] is, however, difficult as it considers a complete system-PPAC would need to be integrated into a system for a fair comparison. We note, however, that if the method in [3] is used to compute MVPs, an element-wise multiplication between two vectors whose entries are L-bit requires L 2 + 5L − 2 clock cycles [4], which is a total of 34 clock cycles for 4-bit numbers.…”
Section: B Comparison With Existing Acceleratorsmentioning
confidence: 99%
“…Some recent research works start to explore and evaluate the performance of this concept. It has been applied both on volatile memories [7] [8] [9] and non volatile memories [10] [11].…”
Section: Related Work a In-memory Computingmentioning
confidence: 99%