Improving DRAM performance by parallelizing refreshes with accesses

Chang, Ki-Ho; Lee, Donghyuk; Chishti, Zeshan; Alameldeen, Alaa R.; Wilkerson, Chris; Kim, Yoongu; Mutlu, Onur

doi:10.1109/hpca.2014.6835946

Cited by 178 publications

(160 citation statements)

References 23 publications

Supporting

Mentioning

156

Contrasting

Order By: Relevance

“…Figure 1a shows the internal organization of a DRAM subarray [8,34,36,62], which consists of a 2-D array of DRAM cells connected to a single row of sense amplifiers (a row of sense amplifiers is also referred to as a row buffer). The sense amplifier is a component that essentially acts as a latch -it detects the data stored in the DRAM cell and latches on to the corresponding data.…”

Section: Dram Backgroundmentioning

confidence: 99%

“…Low Latency DRAM Architectures: Previous works [8,16,34,36,45,51,61,62,66,75] propose new DRAM architectures that provide lower latency. These works improve DRAM latency at the cost of either significant additional DRAM chip area (i.e., extra sense amplifiers [45,61,66] or additional SRAM cache [16,75]), specialized protocols [8,34,36,62] or both.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

Lee

Kim

Pekhimenko

et al. 2015

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

Self Cite

194

328

View full text Add to dashboard Cite

In current systems, memory accesses to a DRAM chip must obey a set of minimum latency restrictions specified in the DRAM standard. Such timing parameters exist to guarantee reliable operation. When deciding the timing parameters, DRAM manufacturers incorporate a very large margin as a provision against two worst-case scenarios. First, due to process variation, some outlier chips are much slower than others and cannot be operated as fast. Second, chips become slower at higher temperatures, and all chips need to operate reliably at the highest supported (i.e., worst-case) DRAM temperature (85• C). In this paper, we show that typical DRAM chips operating at typical temperatures (e.g., 55• C) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case.Our goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Using an FPGA-based testing platform, we first characterize the extra margin for 115 DRAM modules from three major manufacturers. Our results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55• C without sacrificing correctness. Based on this characterization, we propose Adaptive-Latency DRAM (AL-DRAM), a mechanism that adaptively reduces the timing parameters for DRAM modules based on the current operating condition. AL-DRAM does not require any changes to the DRAM chip or its interface.We evaluate AL-DRAM on a real system that allows us to reconfigure the timing parameters at runtime. We show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors. We discuss and show why AL-DRAM does not compromise reliability. We conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance.

show abstract

Section: Dram Backgroundmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

Lee

Kim

Pekhimenko

et al. 2015

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

Self Cite

194

328

View full text Add to dashboard Cite

show abstract

“…Unfortunately, further scaling of DRAM cells has become costly [5,77,53,40,62,1] due to increased manufacturing complexity/cost, reduced cell reliability, and potentially increased cell leakage leading to high refresh rates. Several key issues to tackle include: 1) reducing the negative impact of refresh on energy, performance, QoS, and density scaling [71,72,17], 6.5 Challenge 1: New DRAM Architectures ix 2) improving DRAM parallelism/bandwidth [57,17], latency [68], and energy efficiency [57,68,71], 3) improving reliability of DRAM at low cost [90,75,58,51], 4) reducing the significant amount of waste present in today's main memories in which much of the fetched/stored data can be unused due to coarse-granularity management [79,117,94,95,110], 5) minimizing data movement between DRAM and processing elements, which causes high latency, energy, and bandwidth consumption [102].…”

Section: Challenge 1: New Dram Architecturesmentioning

confidence: 99%

“…Chang et al [17] discuss mechanisms to improve the parallelism between reads and writes, and Kang et al [50] discuss the use of SALP as a way of tolerating long write latencies to DRAM, which they identify as one of the three key scaling challenges for DRAM, amongst refresh and variable retention time. We refer the reader to these works for more information about these parallelization techniques.…”

Section: Improving Dram Parallelismmentioning

confidence: 99%

Main Memory Scaling: Challenges and Solution Directions

Mutlu

2015

More Than Moore Technologies for Next Generation Computer Design

Self Cite

View full text Add to dashboard Cite

The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM technology is experiencing difficult technology scaling challenges that make the maintenance and enhancement of its capacity, energy-efficiency, and reliability significantly more costly with conventional techniques.In this chapter, after describing the demands and challenges faced by the memory system, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we describe three major solution directions: 1) enabling new DRAM architectures, functions, interfaces, and better integration of the DRAM and the rest of the system (an approach we call system-DRAM co-design), 2) designing a memory system that employs emerging non-volatile memory technologies and takes advantage of multiple different technologies (i.e., hybrid memory systems), 3) providing predictable performance and QoS to applications sharing the memory system (i.e., QoS-aware memory systems). We also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory. IntroductionMain memory is a critical component of all computing systems, employed in server, embedded, desktop, mobile and sensor environments. Memory capacity, energy, cost, performance, and management algorithms must scale as we scale the size of the computing system in order to maintain performance growth and enable new applications. Unfortunately, such scaling has become difficult because recent trends in systems, applications, and technology greatly exacerbate the memory system bottleneck.v

show abstract

“…Commodity DDR3 (2007) [14]; DDR4 (2012) [18] Low-Power LPDDR3 (2012) [17]; LPDDR4 (2014) [20] Graphics GDDR5 (2009) [15] Performance eDRAM [28], [32]; RLDRAM3 (2011) [29] 3D-Stacked WIO (2011) [16]; WIO2 (2014) [21]; MCDRAM (2015) [13]; HBM (2013) [19]; HMC1.0 (2013) [10]; HMC1.1 (2014) [11] Academic SBA/SSA (2010) [38]; Staged Reads (2012) [8]; RAIDR (2012) [27]; SALP (2012) [24]; TL-DRAM (2013) [26]; RowClone (2013) [37]; Half-DRAM (2014) [39]; Row-Buffer Decoupling (2014) [33]; SARP (2014) [6]; AL-DRAM (2015) [25] At the forefront of such innovations should be DRAM simulators, the software tool with which to evaluate the strengths and weaknesses of each new proposal. However, DRAM simulators have been lagging behind the rapid-fire changes to DRAM.…”

Section: Segment Dram Standards and Architecturesmentioning

confidence: 99%

Ramulator: A Fast and Extensible DRAM Simulator

Kim

Yang

Mutlu

2016

IEEE Comput. Arch. Lett.

Self Cite

523

228

View full text Add to dashboard Cite

Abstract-Recently, both industry and academia have proposed many different roadmaps for the future of DRAM. Consequently, there is a growing need for an extensible DRAM simulator, which can be easily modified to judge the merits of today's DRAM standards as well as those of tomorrow. In this paper, we present Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility. Unlike existing simulators, Ramulator is based on a generalized template for modeling a DRAM system, which is only later infused with the specific details of a DRAM standard. Thanks to such a decoupled and modular design, Ramulator is able to provide out-of-the-box support for a wide array of DRAM standards: DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, as well as some academic proposals (SALP, AL-DRAM, TL-DRAM, RowClone, and SARP). Importantly, Ramulator does not sacrifice simulation speed to gain extensibility: according to our evaluations, Ramulator is 2.5× faster than the next fastest simulator. Ramulator is released under the permissive BSD license.

show abstract

Improving DRAM performance by parallelizing refreshes with accesses

Cited by 178 publications

References 23 publications

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

Main Memory Scaling: Challenges and Solution Directions

Ramulator: A Fast and Extensible DRAM Simulator

Contact Info

Product

Resources

About