<p>Manycore processors may generally be implemented as an array of small processing elements (PE) interconnected by a communication mesh (NoC). This article describes a clock system for such chips, with many thousands of high frequency PEs.</p> <p>Each PE contains a low energy oscillator. It synchronizes with the four neighbors by an additional low voltage wire parallel to the communication links, which carries a sinusoidal signal. This wire is part of a resonant circuit that extends to all PE oscillators. Theoretically, in an infinite mesh the oscillators will all be phase locked, but in a limited mesh there will be fringe effects. In a mesh with 25×25 oscillators, the maximum skew between neighboring regions is within 3.3 ps. By slightly adjusting the free running frequency of the oscillators, the skew can be reduced to 1.2 ps.</p> <p>Because there is no central clock, both power consumption and clock frequency can be improved compared to a conventional clock distribution network. A PE of 150×150 μm² running at 6.7 GHz with 93 master-slave flip-flops is used as an example. The PE-internal clock skew is less than 2.3 ps, and the energy consumption of the clock system 807 μW per PE. This corresponds to an effective gate and wire capacitance of 509 aF, or 7.3 gate capacitances.</p> <p>Scheduling the local oscillators gradually along one of the grid’s axes reduces the power noise. In this way, surge currents, which generally have their peaks at the clock edges, are distributed evenly over a full clock cycle.</p>
<p>The Bubble NoC is simple, but still provides outstanding performance. Flow control is implemented by <em>bubbles</em>, which are inserted between the flits. The algorithm resembles a traffic situation where a vehicle only moves if the next position is empty. When a flit moves, a bubble is created behind it, and when there is a blocking the bubbles are collapsed as the flits behind are packed together. Even when the Bubble NoC is saturated, it degrades gracefully, and the execution continues.</p> <p>Deterministic prerouting is used, with the address stored as markers in a 2-out-of-32 code. The XY routing algorithm shifts the address one step at each hop, and turns or finishes when there is a marker first in the address.</p> <p>The physical implementation is a mesh of <em>lanes</em> containing time-division multiplexed links of 38 wires carrying a 32-bit payload. Signaling is made by current injection that charges the wires. A switch is placed in a four-way crossing, with a fifth local connection into a lane. The switch has an input register for each approaching lane. Straight ahead traffic is simply let through, and a diagonal gate is used for the turning traffic.</p> <p>All switches are transmission gates, and the control is distributed as a sidewalk in a few µm of the periphery that surrounds the intersection. In a 14 nm technology, the lanes are 7 μm wide, the crossing is 17 μm in square, the hop frequency 3.3 GHz, and the energy for a datapath 4.1 fJ/bit/hop (150 μm).</p>
<div>The Bubble NoC is based on simplicity and provides outstanding performance. Flow control is implemented by <i>bubbles</i>, which are inserted between the flits. The logic resembles a traffic situation where a vehicle only moves if the next position is empty. When a flit moves, a bubble is created behind it, and when there is a blocking the bubbles are collapsed as the flits behind are packed together. Even when the Bubble NoC is saturated, it degrades gracefully, and the execution continues.</div><div> Deterministic prerouting is used, with the address stored as markers in a 2 out of 32 code. The routing algorithm shifts the address one step at each hop and turns or finishes when a marker starts the address.</div><div> The physical implementation is a mesh of <i>streets</i> containing duplex links of 38 wires carrying 32-bit payload. Signaling is based on current injection that charges the wires. A switch is placed in a four-way crossing, with a fifth local connection into a street. The switch contains input registers for each approaching street. Straight through traffic is simply passed on, and a diagonal gate is used for turning traffic.</div><div> All switches are bidirectional transmission gates, and the control is distributed as a sidewalk in a few µm of the periphery surrounding the intersection. In a 14 nm technology, the streets are 8 μm wide, the crossing is 17 μm in square, the hop frequency 6.67 GHz and the energy for a datapath 4.1 fJ/bit/hop (150 µm).</div>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.