Power consumption is the main hurdle in the race for designing Exascale-capable computing systems which would require deploying millions of computing elements. While this problem is being addressed by designing increasingly more power-efficient processing subsystems, little effort has been put on reducing the power consumption of the interconnection network. This is precisely the objective of this work, in which we study the benefits, in terms of both area and power, of avoiding costly and power-hungry CAM-based routing tables deep-rooted in all current networking technologies. We present our custom-made, FPGAbased router based on a simple, arithmetic routing engine which is shown to be much more power-and area-efficient than even a relatively small 2K-entry routing table which requires as much area and one order of magnitude more power than our router.
HPC architects are currently facing myriad challenges from ever tighter power constraints and changing workload characteristics. In this article we discuss the current state of FPGAs within HPC systems. Recent technological advances show that they are well placed for penetration into the HPC market. However, there are still a number of research problems to overcome; we address the requirements for system architectures and interconnects to enable their proper exploitation, highlighting the necessity of allowing FPGAs to act as full-fledged peers within a distributed system rather than attached to the CPU. We argue that this model requires a reliable, connectionless, hardware-offloaded transport supporting a global memory space. Our results show how our fully-fledged hardware implementation gives latency improvements of up to 25% versus a software-based transport, and demonstrates that our solution can outperform the state of the art in HPC workloads such as matrix-matrix multiplication achieving a 10% higher computing throughput.
Ongoing transistor scaling and the growing complexity of embedded system designs has led to the rise of MPSoCs (Multi-Processor System-on-Chip), combining multiple hard-core CPUs and accelerators (FPGA, GPU) on the same physical die. These devices are of great interest to the supercomputing community, who are increasingly reliant on heterogeneity to achieve power and performance goals in these closing stages of the race to exascale. In this paper, we present a network interface architecture and networking infrastructure, designed to sit inside the FPGA fabric of a cutting-edge MPSoC device, enabling networks of these devices to communicate within both a distributed and shared memory context, with reduced need for costly software networking system calls. We will present our implementation and prototype system and discuss the main design decisions relevant to the use of the Xilinx Zynq Ultrascale+, a state-of-the-art MPSoC, and the challenges to be overcome given the device's limitations and constraints. We demonstrate the working prototype system connecting two MPSoCs, with communication between processor and remote memory region and accelerator. We then discuss the limitations of the current implementation and highlight areas of improvement to make this solution production-ready. KEYWORDSdistributed shared memory, FPGA, HPC, interconnect, MPSoC, networks INTRODUCTION AND MOTIVATIONOver the past decade, the embedded systems landscape has changed dramatically due to the growing demands from the mobile market and the rise of the Internet of Things. These advances led to the SoC paradigm, with increasingly complex and heterogeneous systems placed upon the same physical die. At the same time, the High Performance Computing (HPC) community has had to deal with the consequences of the breakdown of Dennard scaling, 1 causing an explosion in the core count and power consumption of the largest machines to keep pace with the demands for ever greater computing capabilities.These two phenomena have created an opportunity for convergence between the HPC and the Data Center markets, and to the use of low-power, mobile processors, which are beginning to penetrate the server market, 2 and are even being used by the RIKEN institute for the next stage of their roadmap to an Exascale class machine, the post-K computer. 3 It is unsurprising that this shift is happening, given the greatest challenge that computer and system architects now face in the race to exascale is the need for increasing energy efficiency. Naively scaling out current architectures, eg, in the TOP500, would result in an exascale machine requiring in excess of 100MW power, which is unrealistic in terms of infrastructure and cost.Designers are facing the challenge of reducing power consumption by a number of means, ie, increased component density, tighter coupling between processor/memory/accelerator/network, shorter paths for copper lines, increased performance/Watt of components, hyper-converged storage, etc.The relentless quest for increasingly more power-effici...
Exascale performance will be delivered by systems composed of millions of interconnected computing cores. The way these computing elements are connected with each other (network topology) has a strong impact on many performance characteristics. In this work we propose a multi-objective optimizationbased framework to explore possible network topologies to be implemented in the EU-funded ExaNeSt project. The modular design of this system's interconnect provides great flexibility to design topologies optimized for specific performance targets such as communications locality, fault tolerance or energyconsumption. The generation procedure of the topologies is formulated as a three-objective optimization problem (minimizing some topological characteristics) where solutions are searched using evolutionary techniques. The analysis of the results, carried out using simulation, shows that the topologies meet the required performance objectives. In addition, a comparison with a wellknown topology reveals that the generated solutions can provide better topological characteristics and also higher performance for parallel applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.