2016
DOI: 10.1007/978-3-319-30695-7_2
|View full text |Cite
|
Sign up to set email alerts
|

Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube

Abstract: 3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the "memory wall". Several publications in the past few years demonstrate this renewed interest. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(45 citation statements)
references
References 18 publications
0
45
0
Order By: Relevance
“…Year Category NMC capabilities Sinuca [88] 2015 Cycle-Accurate Yes HMC-SIM [89] 2016 Cycle-Accurate Limited CasHMC [90] 2016 Cycle-Accurate No SMC [31] 2016 Cycle-Accurate Yes CLAPPS [82] 2017 Cycle-Accurate Yes Ramulator-PIM [91] 2019 Cycle-Accurate Yes 2) Simulation Based Modeling allows to achieve more accurate performance numbers. Architects often resort to modeling the entire micro-architecture precisely.…”
Section: Simulatormentioning
confidence: 99%
“…Year Category NMC capabilities Sinuca [88] 2015 Cycle-Accurate Yes HMC-SIM [89] 2016 Cycle-Accurate Limited CasHMC [90] 2016 Cycle-Accurate No SMC [31] 2016 Cycle-Accurate Yes CLAPPS [82] 2017 Cycle-Accurate Yes Ramulator-PIM [91] 2019 Cycle-Accurate Yes 2) Simulation Based Modeling allows to achieve more accurate performance numbers. Architects often resort to modeling the entire micro-architecture precisely.…”
Section: Simulatormentioning
confidence: 99%
“…Even though Conv-Nets are computation-intensive workloads and extremely high energy-efficiencies have been previously reported for their ASIC implementations [18] [19] [17], the scalability and energy-efficiency of modern ConvNets are ultimately bound by the main memory where their parameters and channels need to be stored (See subsection II-B). This makes them interesting candidates for near memory computation, offering them plenty of bandwidth at a lower cost and without much buffering compared to off-chip accelerators due to lower memory access latency (A consequence of the Little's law 1 [24]).…”
Section: Introductionmentioning
confidence: 99%
“…Within this organization, the processor can be implemented as some sophisticated standard superscalar processor and may contain a vector unit, as is the case with the intelligent RAM (IRAM), [17]. The integrated memory into the chip can be realized as SRAM or embedded DRAM, which is basically accessed through the processor's cache memory, [23]. Considering that the processor and the memory are physically close, the integrated chip can achieve higher memory bandwidth, reduced memory latency and decreased power consumption, compared to today's conventional memory chips, and cache memories in multi-processing systems, [12].…”
Section: Comparative Analysismentioning
confidence: 99%
“…Other scientists have suggested innovations into DRAM memory architecture itself, [31]. This research has resulted with several DRAM solutions, including: asymmetric DRAM (provides non-uniform access to DRAM banks), Reduced Latency DRAM (RLDRAM), Fast Cycle DRAM (FCRAM divides each row in several sub-rows), SALP systems (Subarray-Level Parallelism System allows overlapping of different components of the bank access latencies of multiple requests that go to different subarrays within the same bank), Enhanced DRAM and Virtual Channel DRAM add a SRAM buffer to DRAM memory in order to cache the mostly accessed data, Tiered-Latency DRAM (TL-DRAM uses shorter bit lines with fewer cells), hybrid memory cube (places several memory modules dies on top of each other in a 3D cube shape) and embedded DRAM (eDRAM is integrated on the same chip die with the processor), [23], [32]- [35].…”
Section: Overview Of Techniques For Improving Memory Latency In Procementioning
confidence: 99%
See 1 more Smart Citation