The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory (PIM) chips as smart-memory co-processors to a conventional microprocessor. We have recently fabricated prototype DIVA PIMs. These chips represent the first smart-memory devices designed to support virtual addressing and capable of executing multiple threads of control. In this paper, we describe the prototype PIM architecture. We emphasize three unique features of DIVA PIMs, namely, the memory interface to the host processor, the 256-bit wide datapaths for exploiting on-chip bandwidth, and the address translation unit. We present detailed simulation results on eight benchmark applications. When just a single PIM chip is used, we achieve an average speedup of 3.3X over host-only execution, due to lower memory stall times and increased fine-grain parallelism. These 1-PIM results suggest that a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.
The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smartmemory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project has built a prototype development system using PIM chips in place of standard DRAMs to demonstrate these concepts. We have recently ported several demonstration kernels to this platform and have exhibited a speedup of 35X on a matrix transpose operation.This paper focuses on the 32-bit scalar and 256-bit WideWord integer processing components of the first DIVA prototype PIM chip, which was fabricated in TSMC 0.18 µm technology. In conjunction with other publications, this paper demonstrates that impressive gains can be achieved with very little "smart" logic added to memory devices. A second PIM prototype that includes WideWord floating-point capability is scheduled to tape out in August 2003.
This paper first presents an accurate and efficient method of estimating the short circuit energy dissipation and the output transition time of CMOS buffers. Next, the paper describes a sizing method for tapered buffer chains. It is shown that the first-order sizing behavior, which considers only the capacitive energy dissipation, can be improved by considering the shortcircuit dissipation as well, and that the second-order polynomial expressions for short-circuit energy improves the accuracy over linear expressions. These results are used to derive sizing rules for buffered chains, which optimize the overall energy-delay product.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.