Parallelism is inherent in most problems but due to current programming models and architectures which have evolved from a sequential paradigm, the parallelism exploited is restricted. We believe that the most efficient parallel execution is achieved when applications are represented as graphs of operations and data, which can then be mapped for execution on a modular and scalable processing-in-memory architecture. In this paper, we present PHOENIX, a general-purpose architecture composed of many Processing Elements (PEs) with memory storage and efficient computational logic units interconnected with a mesh network-on-chip. A preliminary design of PHOENIX shows it is possible to include 10,000 PEs with a storage capacity of 0.6GByte on a 1.5cm 2 chip using 14nm technology. PHOENIX may achieve 6TFLOPS with a power consumption of up to 42W, which results in a peak energy efficiency of at least 143GFLOPS/W. A simple estimate shows that for a 4K FFT, PHOENIX achieves 117GFLOPS/W which is more than double of what is achieved by state-of-the-art systems.
<div>The Bubble NoC is based on simplicity and provides outstanding performance. Flow control is implemented by <i>bubbles</i>, which are inserted between the flits. The logic resembles a traffic situation where a vehicle only moves if the next position is empty. When a flit moves, a bubble is created behind it, and when there is a blocking the bubbles are collapsed as the flits behind are packed together. Even when the Bubble NoC is saturated, it degrades gracefully, and the execution continues.</div><div> Deterministic prerouting is used, with the address stored as markers in a 2 out of 32 code. The routing algorithm shifts the address one step at each hop and turns or finishes when a marker starts the address.</div><div> The physical implementation is a mesh of <i>streets</i> containing duplex links of 38 wires carrying 32-bit payload. Signaling is based on current injection that charges the wires. A switch is placed in a four-way crossing, with a fifth local connection into a street. The switch contains input registers for each approaching street. Straight through traffic is simply passed on, and a diagonal gate is used for turning traffic.</div><div> All switches are bidirectional transmission gates, and the control is distributed as a sidewalk in a few µm of the periphery surrounding the intersection. In a 14 nm technology, the streets are 8 μm wide, the crossing is 17 μm in square, the hop frequency 6.67 GHz and the energy for a datapath 4.1 fJ/bit/hop (150 µm).</div>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.