Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures 2012
DOI: 10.1145/2312005.2312049
|View full text |Cite
|
Sign up to set email alerts
|

Cache-conscious scheduling of streaming applications

Abstract: This paper considers the problem of scheduling streaming applications on uniprocessors in order to minimize the number of cache-misses. Streaming applications are represented as a directed graph (or multigraph), where nodes are computation modules and edges are channels. When a module fires, it consumes some data-items from its input channels and produces some items on its output channels. In addition, each module may have some state (either code or data) which represents the memory locations that must be load… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…[3] present theoretical cache miss limits when scheduling streaming applications represented as directed graphs on uniprocessors. Their work shows that scheduling the graph by selecting partitions comes within a constant factor of the optimal scheduler when heuristics such as working set and data usage rates are known in advance.…”
Section: Improving Cache Efficiencymentioning
confidence: 99%
“…[3] present theoretical cache miss limits when scheduling streaming applications represented as directed graphs on uniprocessors. Their work shows that scheduling the graph by selecting partitions comes within a constant factor of the optimal scheduler when heuristics such as working set and data usage rates are known in advance.…”
Section: Improving Cache Efficiencymentioning
confidence: 99%
“…Closer to our objective of optimizing the parallel execution time, another formulation of the DAG partitioning problem arises in exposing parallelism in automatic differentiation [4,Ch.9], and in general, in the computation of the Newton step for solving nonlinear systems [5]. Other important applications of the DAG partitioning problem include (i) fusing loops for improving temporal locality, and enabling streaming and array contractions in runtime systems [6], such as Bohrium [7]; (ii) analysis of cache efficient execution of streaming applications on uniprocessors [8].…”
Section: Introductionmentioning
confidence: 99%
“…First, we plan to experiment our monitoring approach over platforms for which memory latency vary more and for which the placement decisions will have a greater impact. This will be a strong complement to existing compilation strategies that already take the underlying memory hierarchy into account, eg [25], [22], [11], [10], [1].…”
Section: Discussionmentioning
confidence: 99%