Al Davis scite author profile

While Processing-in-Memory has been investigated for decades, it has not been embraced commercially. A number of emerging technologies have renewed interest in this topic. In particular, the emergence of 3D stacking and the imminent release of Micron's Hybrid Memory Cube device have made it more practical to move computation near memory. However, the literature is missing a detailed analysis of a killer application that can leverage a Near Data Computing (NDC) architecture. This paper focuses on in-memory MapReduce workloads that are commercially important and are especially suitable for NDC because of their embarrassing parallelism and largely localized memory accesses. The NDC architecture incorporates several simple processing cores on a separate, non-memory die in a 3D-stacked memory package; these cores can perform Map operations with efficient memory access and without hitting the bandwidth wall. This paper describes and evaluates a number of key elements necessary in realizing efficient NDC operation: (i) low-EPI cores, (ii) long daisy chains of memory devices, (iii) the dynamic activation of cores and SerDes links. Compared to a baseline that is heavily optimized for MapReduce execution, the NDC design yields up to 15X reduction in execution time and 18X reduction in system energy.

show abstract

Rethinking DRAM design and organization for energy-constrained multi-cores

Udipi

et al. 2010

View full text Add to dashboard Cite

DRAM vendors have traditionally optimized the cost-perbit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bitlines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i) queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric.This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that retains the traditional DDRx SDRAM interface. Selective Bitline Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (up to 10%) by reducing queuing delays. The third innovation further penalizes the cost-perbit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISCA '10, June 19-23, 2010, Saint-Malo, France. Copyright 2010 build stronger RAID-like fault tolerance, including chipkilllevel reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

show abstract

Impulse: building a smarter memory controller

et al. 1999

View full text Add to dashboard Cite

Identification and Characterization of two Bovine Spongiform Encephalopathy cases Diagnosed in the United States

et al. 2007

View full text Add to dashboard Cite

Abstract. Bovine spongiform encephalopathy (BSE) is a transmissible spongiform encephalopathy of cattle, first detected in 1986 in the United Kingdom and subsequently in other countries. It is the most likely cause of variant Creutzfeldt-Jakob disease (vCJD) Sc from case 1 showed molecular features similar to typical BSE isolates, whereas PrP Sc from case 2 revealed an unusual molecular PrP Sc pattern: molecular mass of the unglycosylated and monoglycosylated isoform was higher than that of typical BSE isolates and case 2 was strongly labeled with antibody P4, which is consistent with a higher molecular mass. Sequencing of the prion protein gene of both BSE-positive animals revealed that the sequences of both animals were within the range of the prion protein gene sequence diversity previously reported for cattle.

show abstract

Handling the problems and opportunities posed by multiple on-chip memory controllers

et al. 2010

View full text Add to dashboard Cite

Modern processors such as Tilera's Tile64, Intel's Nehalem, and AMD's Opteron are migrating memory controllers (MCs) on-chip, while maintaining a large, flat memory address space. This trend to utilize multiple MCs will likely continue and a core or socket will consequently need to route memory requests to the appropriate MC via an inter-or intra-socket interconnect fabric similar to AMD's HyperTransport , or Intel's Quick-Path Interconnect . Such systems are therefore subject to non-uniform memory access (NUMA) latencies because of the time spent traveling to remote MCs. Each MC will act as the gateway to a particular piece of the physical memory. Data placement will therefore become increasingly critical in minimizing memory access latencies.To date, no prior work has examined the effects of data placement among multiple MCs in such systems. Future chip-multiprocessors are likely to comprise multiple MCs and an even larger number of cores. This trend will increase the memory access latency variation in these systems. Proper allocation of workload data to the appropriate MC will be important in reducing the latency of memory service requests. The allocation strategy will need to be aware of queuing delays, on-chip latencies, and row-buffer hit-rates for each MC. In this paper, we propose dynamic mechanisms that take these factors into account when placing data in appropriate slices of the physical memory. We introduce adaptive first-touch page placement, and dynamic page-migration mechanisms to reduce DRAM access delays for multi-MC systems. These policies yield average performance improvements of 17% for adaptive first-touch pageplacement, and 35% for a dynamic page-migration policy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Al Davis

NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads

Rethinking DRAM design and organization for energy-constrained multi-cores

Impulse: building a smarter memory controller

Identification and Characterization of two Bovine Spongiform Encephalopathy cases Diagnosed in the United States

Handling the problems and opportunities posed by multiple on-chip memory controllers

Contact Info

Product

Resources

About