Parallel Real-Time Task Scheduling on Multicore Platforms

Anderson, James H.; Calandrino, J.M.

doi:10.1109/rtss.2006.32

Cited by 53 publications

(24 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We consider, in particular, the 90nm, 65nm, 45nm, and 32nm technologies. 3 Although to be concrete the configurations described below are based on specific technologies, our results hold more generally across a wide range of cache parameters.…”

Section: Cmp Design Spacementioning

confidence: 76%

“…Interestingly, Anderson and Calandrino [3] have a similar objective of encouraging the co-scheduling of cooperative threads-but in the context of real-time systems. While their approach is not particularly well-suited to non-real- time systems, their micro-benchmark results do indicate that intelligent co-scheduling of cooperative threads can reduce the number of L2 misses substantially.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Scheduling threads for constructive cache sharing on CMPs

Chen

Gibbons

Kozuch

et al. 2007

Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures

120

View full text Add to dashboard Cite

In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this paper, we compare the performance of two state-of-the-art schedulers proposed for fine-grained multithreaded programs: Parallel Depth First (PDF), which is specifically designed for constructive cache sharing, and Work Stealing (WS), which is a more traditional design. Our experimental results indicate that PDF scheduling yields a 1.3-1.6X performance improvement relative to WS for several fine-grain parallel benchmarks on projected future CMP configurations; we also report several issues that may limit the advantage of PDF in certain applications. These results also indicate that PDF more effectively utilizes off-chip bandwidth, making it possible to trade-off on-chip cache for a larger number of cores. Moreover, we find that task granularity plays a key role in cache performance. Therefore, we present an automatic approach for selecting effective grain sizes, based on a new working set profiling algorithm that is an order of magnitude faster than previous approaches. This is the first paper demonstrating the effectiveness of PDF on real benchmarks, providing a direct comparison between PDF and WS, revealing the limiting factors for PDF in practice, and presenting an approach for overcoming these factors.

show abstract

Section: Cmp Design Spacementioning

confidence: 76%

Section: Related Workmentioning

confidence: 99%

Scheduling threads for constructive cache sharing on CMPs

Chen

Gibbons

Kozuch

et al. 2007

Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures

120

View full text Add to dashboard Cite

show abstract

“…Fedorova et al [3] proposed a method to reduce the L2 contention by discouraging threads with heavy memory-to-L2 traffic from being co-scheduled. And Anderson et al [7], [8], [9] applied the policy of encouraging or discouraging the co-scheduling of tasks, to improve the cache performance and also to meet the real-time constraints. All these works assumed that the WCETs of tasks are known in advance.…”

Section: Related Workmentioning

confidence: 99%

Cache-Aware Cooperative Task Mapping in Multi-core Real-Time Systems

Wang¹,

Ni²,

Jicheng³

et al. 2016

IJIEE

View full text Add to dashboard Cite

Abstract-Program execution can be accelerated with efficient use of cache in real-time systems. And each program has its own instruction access pattern, which causes uneven distribution of accesses to the sets of instruction cache. In multi-core real-time systems, mapping tasks with similar instruction access patterns to the same core will incur massive conflicts and degrade the utilization of the cache. This paper proposes a cache-aware cooperative task mapping method to improve system efficiency in multi-core real-time systems. Our method quantifies the access frequency of instruction cache sets for each task, and then select tasks with complementary distributions to run on the same core, which can reduce inter-task interference and shorten the cache refill delay during context switch. Evaluation results show that the utilization of the system is improved by about 8.92% with the method.

show abstract

“…Anderson et al [23] propose the concept of a megatask as a way to reduce miss rates in shared caches on multicore platforms, and consider Pfair scheduling by inflating the weights of a megatask's component tasks. Preemptive fixed-priority scheduling of parallel tasks is shown to be NP-hard by Han et al [24].…”

Section: Related Workmentioning

confidence: 99%

Multi-core real-time scheduling for generalized parallel task models

et al. 2012

View full text Add to dashboard Cite

Abstract-Multi-core processors offer a significant performance increase over single core processors. Therefore, they have the potential to enable computation-intensive real-time applications with stringent timing constraints that cannot be met on traditional single-core processors. However, most results in traditional multiprocessor real-time scheduling are limited to sequential programming models and ignore intra-task parallelism. In this paper, we address the problem of scheduling periodic parallel tasks with implicit deadlines on multi-core processors. We first consider a synchronous task model where each task consists of segments, each segment having an arbitrary number of parallel threads that synchronize at the end of the segment. We propose a new task decomposition method that decomposes each parallel task into a set of sequential tasks. We prove that our task decomposition achieves a resource augmentation bound of 4 and 5 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively. Finally, we extend our analysis to directed acyclic graph (DAG) task model where each node in the DAG has unit execution requirement. We show how these tasks can be converted into synchronous tasks such that the same transformation can be applied and the same augmentation bounds hold.

show abstract

Parallel Real-Time Task Scheduling on Multicore Platforms

Cited by 53 publications

References 13 publications

Scheduling threads for constructive cache sharing on CMPs

Scheduling threads for constructive cache sharing on CMPs

Cache-Aware Cooperative Task Mapping in Multi-core Real-Time Systems

Multi-core real-time scheduling for generalized parallel task models

Contact Info

Product

Resources

About