Aniruddha N. Udipi scite author profile

Rethinking DRAM design and organization for energy-constrained multi-cores

Udipi

¹

,

Muralimanohar

²

,

Chatterjee

³

et al. 2010

View full text Add to dashboard Cite

DRAM vendors have traditionally optimized the cost-perbit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bitlines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i) queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric.This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that retains the traditional DDRx SDRAM interface. Selective Bitline Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (up to 10%) by reducing queuing delays. The third innovation further penalizes the cost-perbit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISCA '10, June 19-23, 2010, Saint-Malo, France. Copyright 2010 build stronger RAID-like fault tolerance, including chipkilllevel reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

show abstract

Towards scalable, energy-efficient, bus-based on-chip networks

Udipi

¹

,

Muralimanohar

²

,

Balasubramonian

³

2010

View full text Add to dashboard Cite

show abstract

Rethinking DRAM design and organization for energy-constrained multi-cores

Udipi

¹

,

Muralimanohar

²

,

Chatterjee

³

et al. 2010

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

DRAM vendors have traditionally optimized the cost-perbit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bitlines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i) queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric.This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that retains the traditional DDRx SDRAM interface. Selective Bitline Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (up to 10%) by reducing queuing delays. The third innovation further penalizes the cost-perbit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISCA '10, June 19-23, 2010, Saint-Malo, France. Copyright 2010 build stronger RAID-like fault tolerance, including chipkilllevel reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

show abstract

Simulating DRAM controllers for future system architecture exploration

Hansson

¹

,

Agarwal

²

,

Kolli

³

et al. 2014

View full text Add to dashboard Cite

Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device

Shevgoor

¹

,

Kim

²

,

Chatterjee

³

et al. 2013

View full text Add to dashboard Cite

Many of the pins on a modern chip are used for power delivery. If fewer pins were used to supply the same current, the wires and pins used for power delivery would have to carry larger currents over longer distances. This results in an "IR-drop" problem, where some of the voltage is dropped across the long resistive wires making up the power delivery network, and the eventual circuits experience fluctuations in their supplied voltage. The same problem also manifests if the pin count is the same, but the current draw is higher. IR-drop can be especially problematic in 3D DRAM devices because (i) low cost (few pins and TSVs) is a high priority, (ii) 3D-stacking increases current draw within the package without providing proportionate room for more pins, and (iii) TSVs add to the resistance of the power delivery network.This paper is the first to characterize the relationship between the power delivery network and the maximum supported activity in a 3D-stacked DRAM memory device. The design of the power delivery network determines if some banks can handle less activity than others. It also determines the combinations of bank activities that are permissible. Both of these attributes can feed into architectural policies. For example, if some banks can handle more activities than others, the architecture benefits by placing data from high-priority threads or data from frequently accessed pages into those banks. The memory controller can also derive higher performance if it schedules requests to specific combinations of banks that do not violate the IR-drop constraint.We first define an IR-drop-aware scheduler that encodes a number of activity constraints. This scheduler, however, falls short of the performance of an unrealistic ideal PDN

show abstract