THE CONTINUING GROWTH in network bandwidth and services, the need to adapt products to rapid market changes, and the introduction of new network protocols have created the need for a new breed of high-performance, flexible system-on-a-chip (SoC) design platforms. Emerging to meet this challenge is the network processor unit. An NPU is a SoC that includes a highly integrated set of programmable or hardwired accelerated engines, a memory subsystem, high-speed interconnect, and media interfaces to handle packet processing at wire speed. 1 Programmable NPUs preserve customers' investments by letting them track ongoing specification changes. 2 By developing a programmable NPU as a reusable platform, network designers can amortize a significant design effort over a range of architecture derivatives. They can also meet technical challenges arising from a product's time-to-market constraints, as well as economic constraints arising from a product's brief in-market time. StepNP is a system-level exploration platform for NPUs developed at STMicroelectronics. Its main components are a high-level multiprocessor-architecture simulation model; a network router application framework; and a SoC control, debugging, and analysis toolset. We focus here on the hardware architecture simulation platform, with emphasis on the transaction-level communication channel interface and our model interaction, instrumentation, and analysis approach. Wire-speed packet forwarding Packet forwarding over a network includes the following main tasks: header parsing, packet classification, lookup, computation, data manipulation, queue management, and control processing. Control processing usually takes place on a standard reduced-instructionset-computing (RISC) processor linked to the NPU and is not the focus of this article. Wire-speed packet forwarding, at rates often exceeding 1 Gbit per second, poses many more challenges than general-purpose data processing. In network processing, both memory capac
In this paper, we describe the MultiFlex multi-processor SoC programming environment, with focus on two programming models: a distributed system object component (DSOC) message passing model, and a symmetrical multi-processing (SMP) model using shared memory. The MultiFlex tools map these models onto the StepNP multi-processor SoC platform, while making use of harware accelerators for message passing and task scheduling. We present the results of mapping an Internet traffic management application, running at 2.5Gb/s.
The introduction of Transaction Level Modeling (TLM) allows a system designer to model a complete application, composed of hardware and software parts, at several levels of abstraction. The simulation speed of TLM is orders of magnitude faster than traditional RTL simulation; nevertheless, it can become a limiting factor when considering a Multi-Processor System-On-Chip (MP-SoC), as the analysis of these systems can be very complex. The main goal of this paper is to introduce a novel way of exploiting TLM features to increase simulation efficiency of complex systems by switching TLM models at runtime. Results show that simulation performance can be increased significantly without sacrificing the accuracy of critical application kernels.
Architectural heterogeneity is a promising solution to overcome the utilization wall and provide Moore's Law-like performance scaling in future SoCs. However, heterogeneous architectures increase the size and complexity of the design space along several axes: granularity of the heterogeneous processors, coupling with the software cores, communication interfaces, etc. As a consequence, significant enhancements are required to tools and methodologies to explore the huge design space effectively. In this work, we provide three main contributions: first, we describe an extension to the STMicroelectronics P2012 platform to support tightly-coupled shared memory HW processing elements (HWPE), along with our changes to the P2012 simulation flow to integrate this extension. Second, we propose a novel methodology for the semi-automatic definition and instantiation of HWPEs from a C program based on a interface description language. Third, we explore several architectural variants on a set of benchmarks originally developed for the homogeneous version of P2012, achieving up to 123x speedup for the accelerated code region (∼98% of the Amdahl limit for the whole application), thereby demonstrating the efficiency of tightly memory-coupled hardware acceleration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.