Design and evaluation of dynamic access ordering hardware

McKee, Sally A.; Aluwihare, Assaji; Clark, Benjamin; Klenke, Robert H.; Landon, Trevor C.; Oliver, Christopher William; Salinas, M.H.; Szymkowiak, Adam E.; Wright, Kenneth L.; Wulf, Wm. A.; Aylor, J.H.

doi:10.1145/237578.237594

Cited by 29 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The graphs show that the locality of reference for each application (seen in the row-buffer hit-rates, Figure 19) can have a dramatic impact on the access latency-for example, there is a 10% to 90% difference between the average access latencies for li and perl. This effect has been seen beforeMcKee's work shows that intentionally reordering memory accesses to exploit locality can have an order of magnitude effect on memory-system performance [24,25].…”

Section: Perfect-width Busesmentioning

confidence: 70%

See 1 more Smart Citation

High-performance DRAMs in workstation environments

Cuppu

Jacob

Davis

et al. 2001

IEEE Trans. Comput.

View full text Add to dashboard Cite

I NTRODUCTIONIn response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-based performance study of a representative group, evaluating each in terms of its effect on total execution time. While there are a number of academic proposals for new DRAM designs, space limits us to covering only existing commercial architectures. To obtain accurate memory-request timing for an aggressive out-of-order processor, we integrate our code into the SimpleScalar tool set [4]. This paper presents a baseline study of a small-system DRAM organization : these are systems with only a handful of DRAM chips (0.1-1GB). We do not consider large-system DRAM organizations with many gigabytes of storage that are highly interleaved. We also study a set of benchmarks that are appropriate for such systems: user-class applications such as compilers and small databases rather than server-class applications such as transaction processing systems. The study asks and answers the following questions:• What is the effect of improvements in DRAM technology on the memory latency and bandwidth problems?Contemporary techniques for improving processor performance and tolerating memory latency are exacerbating the memory bandwidth problem [5]. Our results show that current DRAM architectures are attacking exactly this problem: the most recent technologies (SDRAM, ESDRAM, DDR, and Rambus) have reduced the stall time due to limited bandwidth by a factor of three compared to earlier DRAM architectures. However, the memory-latency component of overhead has not improved.• Where is time spent in the primary memory system (the memory system beyond the cache hierarchy, but not including secondary [disk] or tertiary [backup] storage)? What is the performance benefit of exploiting the page mode of contemporary DRAMs?For the newer DRAM designs, the time to extract the required data from the sense amps/row caches for transmission on the memory bus is the largest component in the average access time, though page mode allows this to be overlapped with column access and the time to transmit the data over the memory bus.• How much locality is there in the address stream that reaches the primary memory system?The stream of addresses that miss the L2 cache contains a significant amount of locality, as measured by the hit-rates in the DRAM row buffers. The hit rates for the applications studied range 2-97%, with a mean hit rate of 40% for a 1MB L2 cache. (This does not include hits to the row buffers when making multiple DRAM requests to read one cache-line.) High-Performance DRAMs in Workstation EnvironmentsVinodh Cuppu, Student Member, IEEE, Bruce Jacob, Member, IEEE, Brian Davis, Member, IEEE, Trevor Mudge, Fellow, IEEE Abstract -This paper presents a simulation-based performance study of several of the new high-performance DRAM architectures, each evaluated in a small system organization. These small-system organizations correspond to workstation-class c...

show abstract

Section: Perfect-width Busesmentioning

confidence: 70%

“…We do not look at the floating-point benchmarks here because their regular access patterns make them easy targets for optimizations such as prefetching and access reordering [24,25]. …”

Section: Total Execution Timementioning

confidence: 99%

High-performance DRAMs in workstation environments

Cuppu

Jacob

Davis

et al. 2001

IEEE Trans. Comput.

View full text Add to dashboard Cite

show abstract

“…The graphs show that the locality of reference for each application (seen in the row-buffer hit-rates, Figure 13) can have a dramatic impact on the access latency-for example, there is a factor of two difference between the average access latency for compress and perl. This effect has been seen before-McKee's work shows that intentionally reordering memory accesses to exploit locality can have an order of magnitude effect on memory-system performance [21,22].…”

Section: Perfect-width Busesmentioning

confidence: 71%

A performance comparison of contemporary DRAM architectures

CuppuVinodh

JacobBruce

DavisBrian

et al. 1999

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

show abstract

“…Similar data alignment problems were addressed in mas sively parallel SIMD computers such as the CM2 [11]. The work in the area of hardware support for memory access reordering such as [17] is also similar though it is not just restricted to overcome subword parallelism.…”

Section: Related Workmentioning

confidence: 93%