Abstract-This paper explores the design of a circuitswitched network-on-chip (NoC) based on time-divisionmultiplexing (TDM) for use in hard real-time systems. Previous work has primarily considered application-specific systems. The work presented here targets general-purpose hardware platforms. We consider a system with IP-cores, where the TDM-NoC must provide directed virtual circuits -all with the same bandwidth -between all nodes. This may not be a frequent scenario, but a general platform should provide this capability, and it is an interesting point in the design space to study. The paper presents an FPGA-friendly hardware design, which is simple, fast, and consumes minimal resources. Furthermore, an algorithm to find minimum-period schedules for all-to-all virtual circuits on top of typical physical NoC topologies like 2D-mesh, torus, bidirectional torus, tree, and fat-tree is presented. The static schedule makes the NoC timepredictable and enables worst-case execution time analysis of communicating real-time tasks.
With the rising number of cores on a single chip the question on how to organize the communication among those cores becomes more and more relevant. A common solution is to use a network-on-chip (NoC) that provides communication bandwidth, routing, and arbitration among the cores. The use of NoCs in real-time systems is problematic, since the shared network and all cores connected to it have to be analyzed to derive time bounds of real-time tasks.We propose to use a statically scheduled, time-divisionmultiplexed NoC design that allows a decoupled analysis of individual real-time tasks. Our network provides virtual circuits between all cores. These virtual circuits are implemented by delivering messages periodically on a static, fixed routing schedule. Since the routing does not change, it can be pre-computed offline.This work focuses on the computation of routing schedules for symmetric NoC topologies, e.g., torus and hypercube. Due to the symmetry, the all-to-all communication can be modeled via simplified communication patterns that are concurrently processed by all routers. The scheduling problem is solved by a heuristic that tries to maximize the overlap of active patterns. Our experiments show that, for larger networks, our heuristic yields schedule lengths that are only 15% to 20% longer than theoretical lower bounds.
Abstract-Real-time systems need time-predictable architectures to support static worst-case execution time (WCET) analysis. One architectural feature, the data cache, is hard to analyze when different data areas (e.g., heap allocated and stack allocated data) share the same cache. This sharing leads to less precise results of the cache analysis part of the WCET analysis.Splitting the data cache for different data areas enables composable data cache analysis. The WCET analysis tool can analyze the accesses to these different data areas independently.In this paper we present the design and implementation of a cache for stack allocated data. Our port of the LLVM C++ compiler supports the management of the stack cache. The combination of stack cache instructions and the hardware implementation of the stack cache is a further step towards timepredictable architectures.
Instruction selection is a well-studied compiler phase that translates the compiler's intermediate representation of programs to a sequence of target-dependent machine instructions optimizing for various compiler objectives ( e.g. speed and space). Most existing instruction selection techniques are limited to the scope of a single statement or a basic block and cannot cope with irregular instruction sets that are frequently found in embedded systems. We consider an optimal technique for instruction selection that uses Static Single Assignment ( SSA ) graphs as an intermediate representation of programs and employs the Partitioned Boolean Quadratic Problem ( PBQP ) for finding an optimal instruction selection. While existing approaches are limited to instruction patterns that can be expressed in a simple tree structure, we consider complex patterns producing multiple results at the same time including pre/post increment addressing modes, div-mod instructions, and SIMD extensions frequently found in embedded systems. Although both instruction selection on SSA-graphs and PBQP are known to be NP-complete, the problem can be solved efficiently - even for very large instances. Our approach has been implemented in LLVM for an embedded ARMv5 architecture. Extensive experiments show speedups of up to 57% on typical DSP kernels and up to 10% on SPECINT 2000 and MiBench benchmarks. All of the test programs could be compiled within less than half a minute using a heuristic PBQP solver that solves 99.83% of all instances optimally.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.