It is of critical importance to satisfy deadline requirements for an embedded application to avoid undesired outcomes. Multiprocessor System-on-Chips (MPSoCs) play a vital role in contemporary embedded devices to satisfy timing deadlines. Such MPSoCs include two-level cache hierarchies which have to be dimensioned carefully to support timing deadlines of the application(s) while consuming minimum area and therefore minimum power. Given the deadline of an application, it is possible to systematically derive the maximum time that could be spent on memory accesses which can then be used to dimension the suitable cache sizes. As the dimensioning has to be done rapidly to satisfy the time to market requirement, we choose a well acclaimed rapid cache simulation strategy, the single-pass trace driven simulation, for estimating the cache dimensions. Therefore, for the first time, we address the two main challenges, coherency and scalability, in adapting a single-pass simulator to a MPSoC with two-level cache hierarchy. The challenges are addressed through a modular bottom-up simulation technique where L1 and L2 simulations are handled in independent communicating modules. In this paper, we present how the dimensioning is performed for a two-level inclusive data cache hierarchy in an MPSoC. With the rapid simulation proposed, the estimations are suggested within an hour (worst case on considered application benchmarks). We experimented our approach with task based MPSoC implementations of JPEG and H264 benchmarks and achieved timing deviations of 16.1% and 7.2% respectively on average against the requested data access times. The deviation numbers are always positive meaning our simulator guarantees to satisfy the requested data access time. In addition, we generated a set of synthetic memory traces and used them to extensively analyse our simulator. For the synthetic traces, our simulator provides cache sizes to always guarantee the requested data access time, deviating below 14.5% on average.
Efficiency in embedded systems is paramount to achieve high performance while consuming less area and power. Processors in embedded systems have to be designed carefully to achieve such design constraints. Application Specific Instruction set Processors (ASIPs) exploit the nature of applications to design an optimal instruction set. Despite being not general to execute any application, ASIPs are highly preferred in the embedded systems industry where the devices are produced to satisfy a certain type of application domain/s (either intradomain or inter-domain). Typically, ASIPs are designed from a base-processor and functionalities are added for applications. , MediaBench and SPEC2006) and domains are analysed for two different architectures (ARM-Thumb and PISA). Our study shows that the intra-domain applications contain larger number of common instructions, whereas the inter-domain applications have very less common instructions, regardless of the architecture (and therefore the ISA). This paper studies the multi-application ASIPs and their instruction sets, extensively analysing the instructions for interdomain and intra-domain designs. Metrics analysed are the reusable instructions and the extra cost to add a certain application. A wide range of applications from various application benchmarks (MiBench
Increasingly, embedded systems designers tend to use Application Specific Instruction Set Processors (ASIPs) during the design of application specific systems. However, one of the design metrics of embedded systems is the time to market of a product, which includes the design time of an embedded processor, is an important consideration in the deployment of ASIPs. While the design time of an ASIP is very short compared to an ASIC it is longer than when using a general purpose processor. There exist a number of tools which expedite this design process, and they could be divided into two: first, tools that automatically generate HDL descriptions of the processor for both simulation and synthesis; and second, tools that generate instruction set simulators for the simulation of the hardware models. While the first one is useful to measure the critical path of the design, die area, etc. they are extremely slow for simulating real world software applications. At the same time, the instruction set simulators are fast for simulating real world software applications, but they fail to provide information so readily available from the HDL models. The framework presented in this paper, RACE, addresses this issue by integrating an automatic HDL generator with a well-known instruction set simulator. Therefore, embedded systems designers who use our RACE framework will have the benefits of both a fast instruction set simulation and rapid hardware synthesis at the same time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.