Chip-multiprocessors are quickly becoming popular in embedded systems. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development for such systems. Transactional Memory (T M ) promises to simplify concurrency management in multithreaded applications by allowing programmers to specify coarse-grain parallel tasks, while achieving performance comparable to fine-grain lock-based applications.This paper presents AT LAS, the first prototype of a CMP with hardware support for transactional memory. AT-LAS includes 8 embedded PowerPC cores that access coherent shared memory in a transactional manner. The data cache for each core is modified to support the speculative buffering and conflict detection necessary for transactional execution. We have mapped ATLAS to the BEE2 multi-FPGA board to create a full-system prototype that operates at 100MHz, boots Linux, and provides significant performance and ease-of-use benefits for a range of parallel applications. Overall, the ATLAS prototype provides an excellent framework for further research on the software and hardware techniques necessary to deliver on the potential of transactional memory.
Intel, Hillsboro, ORThe recent emphasis on power efficiency in serial I/O [1-4] reflects the growing need for lower-power chip-to-chip interfaces for computing systems. Boardlevel transceivers using a variety of low-power circuit techniques have demonstrated power efficiencies as low as 2.2mW/(Gb/s) across four data lanes [1]. Because power efficiency generally degrades as the per-lane data rate increases [2], low-power interfaces with high aggregate bandwidths must combine many parallel data lanes within the silicon, package and board area constraints. Parallel links can also reduce average power by disabling some or all lanes during periods of sub-peak bandwidth demand, but the efficiency and latency of this scheme is limited by wake-up time [4]. This paper describes a 470Gb/s binary NRZ parallel interface in 45nm CMOS that consumes 1.4mW/Gb/s. The circuitry and interconnect were co-designed to minimize power and area. Power is reduced by sharing clocking within "bundles" of data lanes, minimizing the span of clock signals and pairing a low-swing TX driver with a sensitive RX sampler. Silicon area is minimized by using on-chip transmission lines (TLs) to redistribute clock and data signals, while a dense, top-side package connector enables intra-bundle delay matching. The interface also has fast (<5ns) RX standby wake-up and an integrated wake-up time measurement circuit to enable aggressive power management. The interface and channel topology are intended for CPU-to-CPU and CPU-to-memory communication.Figure 8.1.1 shows the interconnect topology and link schematic. The link is asymmetric full-duplex with 19 lanes in one direction and 28 lanes in the opposite direction. The data lanes are organized into groups of 9 or 10, which are referred to as bundles. A single forwarded clock transmitter and injection-locked VCO (IL-VCO) are shared for each die. The interconnect topology consists of two packaged dies connected to a bridge board through top-side package connectors. This topology is compatible with either a high-density interconnect (HDI) or Flex cable, but the HDI implementation is the focus of this work. The data signals for each bundle are routed on a single layer of the bridge, and all lanes within a bundle are length matched to <100µm. This allows clock recovery to be done on a per-bundle basis. The package-to-bridge connector is a 500µm pitch LGA, which provides approximately 4X area density advantage over socket-to-PCB routing and facilitates length matching in the package breakout. The channel is continued on-die with length matched TLs that route the data and clock signals to centrally located TX and RX bundle circuitry ( Fig. 8.1.7). Each bundle occupies the area of only eight C4 bumps. The total area for active interface circuitry is 3.2mm 2 .Figure 8.1.2 shows the schematic for the TX portion of the interface. A supplyregulated IL-VCO generates the interface clock, emulating a system wherein multiple interfaces share a single PLL and filter the clock locally. Alternately, a per-interface PLL based...
CXXXl. Handes erstes Heft. Erste Abtheilung. 1. Physih, Chemie und prabtische Pharmacie. *) Aus C as per's Vierteljahrsschrift fur gerichtliche und offentliche Medicin, 3. Bd. 3. Heft zur Benutrung fiir's Archiv der Pharmacie mitgethcilt. D i e Red. Arch. d. Pharm. CXXXLBds. 1. Hft.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.