Congestion-aware task mapping in heterogeneous MPSoCs

Carvalho, Ewerson; Moraes, Fernando

doi:10.1109/issoc.2008.4694878

Cited by 57 publications

(27 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is previous work on dynamic scheduling to meet performance and power constraints on single-ISA heterogeneous MPSoCs [13], [14] assuming ISA transparent migration. Also, there are studies on static mapping techniques for disjoint-ISA systems [15], [16] targeting improved performance and power consumption.…”

Section: E Informed Dynamic Schedulingmentioning

confidence: 99%

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

Georgakoudis

Nikolopoulos

Vandierendonck

et al. 2014

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV)

View full text Add to dashboard Cite

Abstract-HeterogeneousMPSoCs where different types of cores share a baseline ISA but implement different operational accelerators combine programmability with flexible customization. They hold promise for high performance under power and area limitations. However, transparent binary execution and dynamic scheduling is hard on those platforms. The stateof-the-art approach for transparent accelerated execution is fault-and-migrate (FAM): when a thread executes an accelerating instruction unavailable on the host core, it is forcibly migrated to an accelerating core which implements the instruction natively. Unfortunately, this approach prohibits dynamic scheduling through flexible thread migration, which is essential to any asymmetric platform for efficient utilization of heterogeneous resources.We present two distinct binary-level techniques -Dynamic Binary Rewriting (DBR) and Dynamic Binary Translation (DBT) -which enable selective acceleration, while preserving transparent thread execution and migration, to any core in the system, at any point in time. DBR rewrites binary code to exploit any accelerating instructions available in the host core. DBT implements a fault-and-rewrite scheme, which sets up trampolines to emulation routines for these accelerating instructions which are not available on the host core. Both methods customize binary code on demand, enabling flexible migration.We evaluate the overhead of DBR and DBT against FAM on a real hardware shared-ISA MPSoC prototype. Experiments with single-thread programs show flexible migration is possible with manageable overhead. We measure the performance of our binary-level techniques by artificially triggering periodic thread migration between a Base and an accelerating (ACC) core. Periodic migration, without aiming for optimized scheduling, results in an average slowdown of about 40% under DBR or about 10% under DBT, compared to FAM driven scheduling. We also show results for a speedup proportional dynamic scheduler, enabled by our techniques, using multi-program workloads. In this case, up to 50% faster execution times can be achieved by leveraging flexible thread migration.

show abstract

Section: E Informed Dynamic Schedulingmentioning

confidence: 99%

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

Georgakoudis

Nikolopoulos

Vandierendonck

et al. 2014

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV)

View full text Add to dashboard Cite

show abstract

“…A Dynamic Spiral Mapping (DSM) algorithm for mapping an application on an MPSoC arranged in a 2-D mesh topology is proposed in [2]. The authors from [4] present network congestion-aware heuristics for mapping tasks on NoC-based MPSoCs at run-time. The work of [16] uses a Smart Nearest Neighbour approach to perform run-time task mapping.…”

Section: Related Researchmentioning

confidence: 99%

A scenario-based run-time task mapping algorithm for MPSoCs

Quan

Pimentel

2013

Proceedings of the 50th Annual Design Automation Conference

View full text Add to dashboard Cite

The application workloads in modern MPSoC-based embedded systems are becoming increasingly dynamic. Different applications concurrently execute and contend for resources in such systems which could cause serious changes in the intensity and nature of the workload demands over time. To cope with the dynamism of application workloads at run time and improve the efficiency of the underlying system architecture, this paper presents a novel scenario-based run-time task mapping algorithm. This algorithm combines a static mapping strategy based on workload scenarios and a dynamic mapping strategy to achieve an overall improvement of system efficiency. We evaluated our algorithm using a homogeneous MPSoC system with three real applications. From the results, we found that our algorithm achieves an 11.3% performance improvement and a 13.9% energy saving compared to running the applications without using any run-time mapping algorithm. When comparing our algorithm to three other, well-known run-time mapping algorithms, it is superior to these algorithms in terms of quality of the mappings found while also reducing the overheads compared to most of these algorithms.

show abstract

“…Mapping should be as optimal as possible avoiding long or multiple hops among key tasks, focusing on critical sections and data rates. There are many algorithms and heuristics [16] [17] that can be employed during task mapping, subject to parameters like congestion control [18], thermal awareness [19], architectural topology [20] etc. Keeping the task graph from Figure 2 in view, UAV has complex data flows along with dependencies that induce further latencies.…”

Section: Isocc 2013mentioning

confidence: 99%

A multicore approach to model-based analysis and design of Cyber-Physical Systems

Kanduri

Rahmani

Liljeberg

et al. 2013

2013 International SoC Design Conference (ISOCC)

View full text Add to dashboard Cite

Embedded systems took a leap as combining computational elements with physical systems led to many novel applications, further saw the rise of a new domainCyber-Physical Systems (CPS). Growing importance for CPS in industry threw down many challenges in a designer's perspective ranging from computational methods, modeling platforms, programming structures, relevant hardware systems, etc. Ptolemy is the platform which is tailor made for such full scale design of networked and real time systems. In an effort to explore the suitability of Ptolemy II platform for CPS design, we chose an Unmanned Aerial vehicle (UAV) application as a case study. In this paper, we model UAV in Ptolemy II in a modular and hierarchical way such that the system meets the requirements of data flows and dependencies. Key parameters of a typical CPS such as schedulability and predictability were analyzed. In the end, to better the performance of UAV, computational tasks were mapped onto a networks-on-chip based multicore system. Our experimental results show the efficiency of our high level analysis and modeling and the extracted system requirements to enhance the system predictability.

show abstract

Congestion-aware task mapping in heterogeneous MPSoCs

Cited by 57 publications

References 4 publications

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

A scenario-based run-time task mapping algorithm for MPSoCs

A multicore approach to model-based analysis and design of Cyber-Physical Systems

Contact Info

Product

Resources

About