A performance comparison of contemporary DRAM architectures

CuppuVinodh,; JacobBruce,; DavisBrian,; MudgeTrevor,

doi:10.1145/307338.300998

Cited by 67 publications

(80 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…al. [7] demonstrated that memory manufacturers have successfully scaled data-rates of DRAM chips by employing pipelining but have not reduced the latency of DRAM device operations. In their follow-on paper [8] they showed that system bus configuration choices significantly impact overall system performance.…”

Section: Related Workmentioning

confidence: 99%

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

Ganesh

Jaleel

Wang

et al. 2007

2007 IEEE 13th International Symposium on High Performance Computer Architecture

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

Ganesh

Jaleel

Wang

et al. 2007

2007 IEEE 13th International Symposium on High Performance Computer Architecture

Self Cite

View full text Add to dashboard Cite

show abstract

“…Even with these significant redesigns, the cycle time -as measured by end-to-end access latency -has continued to improve at a rate significantly lower than microprocessor performance. These redesigns have been successful at improving bandwidth, but latency continues to be a constrained by the area impact and cost pressures on DRAM core architectures [2].…”

Section: Dram Architectures -Backgroundmentioning

confidence: 99%

The New DRAM Interfaces: SDRAM, RDRAM and Variants

Davis

Jacob

Mudge

2000

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. For the past two decades, developments in DRAM technology, the primary technology for the main memory of computers, have been directed towards increasing density. As a result 256 M-bit memory chips are now commonplace, and we can expect to see systems shipping in volume with 1 G-bit memory chips within the next two years. Although densities of DRAMs have quadrupled every 3 years, access speed has improved much less dramatically. This is in contrast to developments in processor technology where speeds have doubled nearly every two years. The resulting "memory gap" has been widely commented on. The solution to this gap until recently has been to use caches. In the past several years, DRAM manufacturers have explored new DRAM structures that could help reduce this gap, and reduce the reliance on complex multilevel caches. The new structures have not changed the basic storage array that forms the core of a DRAM; the key changes are in the interfaces. This paper presents an overview of these new DRAM structures.

show abstract

“…For example, a packet buffer built using currently available DRAM would require a 16 000-bit-wide data bus. 1 The purpose of this paper is not to argue that line rates will continue to increase; on the contrary, it could be argued that DWDM will lead to a larger number of logical channels each operating no faster than, say, 10 Gb/s. We simply make the observation that if line rates do increase, then memory bandwidth limitations may make packet buffers and, hence, packet switches difficult or impossible to implement.…”

Section: Introductionmentioning

confidence: 97%

“…Obviously, this does not meet our goal for memory speed. 1 At the time of writing, the random access time (the time to retrieve data at random from any memory location) of a DRAM is approximately 50 ns. Although the access time will be reduced over time, the rate of improvement is much slower than Moore's Law [1].…”

Section: Introductionmentioning

confidence: 99%

Analysis of the parallel packet switch architecture

Iyer

McKeown

2003

IEEE/ACM Trans. Networking

View full text Add to dashboard Cite

Abstract-Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over identical lower speed packet switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet switch performs packet-by-packet load balancing, or inverse multiplexing, over multiple independent packet switches. Each lower speed packet switch operates at a fraction of the line rate . For example, each packet switch can operate at rate . It is a goal of our work that all memory buffers in the PPS run slower than the line rate. Ideally, a PPS would share the benefits of an output-queued switch, i.e., the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service.In this paper, we ask the question: Is it possible for a PPS to precisely emulate the behavior of an output-queued packet switch with the same capacity and with the same number of ports? We show that it is theoretically possible for a PPS to emulate a first-come first-served (FCFS) output-queued (OQ) packet switch if each lower speed packet switch operates at a rate of approximately 2. We further show that it is theoretically possible for a PPS to emulate a wide variety of quality-of-service queueing disciplines if each lower speed packet switch operates at a rate of approximately 3 . It turns out that these results are impractical because of high communication complexity, but a practical high-performance PPS can be designed if we slightly relax our original goal and allow a small fixed-size coordination buffer running at the line rate in both the demultiplexer and the multiplexer. We determine the size of this buffer and show that it can eliminate the need for a centralized scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Furthermore, we show that if the lower speed packet switch operates at a rate of (i.e., without speedup), the resulting PPS can emulate an FCFS-OQ switch within a delay bound.Index Terms-Clos network, inverse multiplexing, load balancing, output queueing, packet switch.

show abstract

A performance comparison of contemporary DRAM architectures

Abstract: ABSTRACT

Cited by 67 publications

References 23 publications

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

The New DRAM Interfaces: SDRAM, RDRAM and Variants

Analysis of the parallel packet switch architecture

Contact Info

Product

Resources

About