The architecture of the DIVA processing-in-memory chip

Draper, J.; Kang, Chang Woo; Kim, Ihn; Daglikoca, Gokhan; Chame, Jacqueline; Hall, Mary; Steele, Craig W.; Barrett, Timothy; LaCoss, Jeff; Granacki, John; Shin, Jaewook; Chen, Chun

doi:10.1145/514195.514197

Cited by 24 publications

(27 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DIVA architecture [9] was developed by Draper, Hall, and others at USC ISI to provide a multicore scalable PIM architecture for a wide array of general applications including scalable embedded applications. This PIM architecture incorporated a simple mechanism for message (parcel) driven computation and supported a network that permitted the interconnection of a number of such components to work together in parallel on the same application.…”

Section: Related Research In the Fieldmentioning

confidence: 99%

The “MIND” scalable PIM architecture

Sterling

Brodowicz

2005

Grid Computing the New Frontier of High Performance Computing

View full text Add to dashboard Cite

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture.

show abstract

Section: Related Research In the Fieldmentioning

confidence: 99%

The “MIND” scalable PIM architecture

Sterling

Brodowicz

2005

Grid Computing the New Frontier of High Performance Computing

View full text Add to dashboard Cite

show abstract

“…DIVA targets applications that are not aided by caches in conventional systems due to little spatial or temporal data locality and are thus severely impacted by the processor-memory bottleneck. Based on our first PIM implementation, a PIM system incorporating these devices is projected to achieve speedups ranging from 8.8 to 38.3 over conventional workstations for a number of applications [2]. Since DIVA PIM chips serve primarily as memory components, it is important to preserve a large majority of the die area for memory, so the processing logic for such PIM chips should be compacted as much as possible.…”

Section: Introductionmentioning

confidence: 99%

Design Trade-Offs in Floating-Point Unit Implementation for Embedded and Processing-In-Memory Systems

Kwon¹,

Sondeen²,

Draper³

2005 IEEE International Symposium on Circuits and Systems

View full text Add to dashboard Cite

“…Processing-in-memory (PIM) [1] has been proposed as a solution to the memory wall problem. It yields dramatically increased memory bandwidth by the inherent nature of an embedded processor directly connected to a memory bank.…”

Section: Introductionmentioning

confidence: 99%

“…It yields dramatically increased memory bandwidth by the inherent nature of an embedded processor directly connected to a memory bank. Although processing-in-memory architectures like the DataIntensive Architecture (DIVA) [1] have significant memory latency advantages over conventional systems, as fabrication technologies advance, latency to on-chip embedded DRAM (eDRAM) is increasing. Conventional systems have employed data caches and load/store queues (LSQ) to combat increasing latency.…”

Section: Introductionmentioning

confidence: 99%

Design trade-offs for load/store buffers in embedded processing environments

Kang

Draper

2007

2007 50th Midwest Symposium on Circuits and Systems

View full text Add to dashboard Cite

Abstract-Memory latency is a critical issue for conventional high-speed computing platforms, and it is becoming a common problem in embedded and CMP (chip multiprocessing) systems as well. Conventional processors typically adopt caches and a Load/Store Queue (LSQ) to address the processor-tomemory bottleneck. However, the conventional LSQ design, which has a large number of entries, is not appropriate for embedded systems due to its area and power hungry out-oforder speculation. A compact, low-power load/store buffer that also provides significant performance improvement is essential for such systems. In this paper, we propose an area-efficient WideWord Load/Store Buffer (WLSB) which supports both WideWord (256-bit) and scalar (32-bit) load/store instructions for a recently fabricated PIM (Processing-In-Memory) device [1]. Given its small size, the 4 entry WLSB yields a 57.33% load hit rate on SPEC2K [3] benchmarks. This result is 5.72% better as compared to a less area-efficient 32-entry fully associative scalar load/store buffer (SLSB). The WLSB was synthesized in IBM 90 nm technology, and the resulting implementation occupies less than a seventh of a square mm and is projected to run at 1.6ns cycle time with 15.72mW of dynamic power dissipation. This paper demonstrates how this very small-entry buffer can affect the load hit rate and quantifies the design trade-offs between wide small-entry and narrow large-entry buffers with respect to size, power, load hit ratio and clock speed. Although this WLSB has been specifically designed to benefit a PIM architecture, it is expected to be useful for other embedded processing platforms and CMPs due to emphasized area/power constraints.

show abstract

The architecture of the DIVA processing-in-memory chip

Cited by 24 publications

References 0 publications

The “MIND” scalable PIM architecture

The “MIND” scalable PIM architecture

Design Trade-Offs in Floating-Point Unit Implementation for Embedded and Processing-In-Memory Systems

Design trade-offs for load/store buffers in embedded processing environments

Contact Info

Product

Resources

About