Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008
DOI: 10.1145/1345206.1345210
|View full text |Cite
|
Sign up to set email alerts
|

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Abstract: Several parallel architectures such as GPUs and the Cell processor have fast explicitly managed on-chip memories, in addition to slow off-chip memory. They also have very high computational power with multiple levels of parallelism. A significant challenge in programming these architectures is to effectively exploit the parallelism available in the architecture and manage the fast memories to maximize performance.In this paper we develop an approach to effective automatic data management for on-chip memories, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
70
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 79 publications
(70 citation statements)
references
References 42 publications
0
70
0
Order By: Relevance
“…This naive automatic scheme transfers the read set into the GPU when a kernel is invoked, and copies the write set to the CPU immediately after the kernel ends 1 . It is easy to implement in a compiler and has been used in an initial version of Chapel [19] for the GPU and with minor variations in the OpenMP to GPU compiler [12].…”
Section: The Need For Efficient Memory Managementmentioning
confidence: 99%
See 1 more Smart Citation
“…This naive automatic scheme transfers the read set into the GPU when a kernel is invoked, and copies the write set to the CPU immediately after the kernel ends 1 . It is easy to implement in a compiler and has been used in an initial version of Chapel [19] for the GPU and with minor variations in the OpenMP to GPU compiler [12].…”
Section: The Need For Efficient Memory Managementmentioning
confidence: 99%
“…Baskaran et al develop an automatic polyhedral model-based framework [1] to insert data transfers statically. Their method is closely tied to their parallelization framework and does not involve any runtime coherence mechanism to avoid transfers of non-stale data.…”
Section: Related Workmentioning
confidence: 99%
“…Several papers [11], [12], [13], [14], [15], [17] discuss porting applications to GPUs and improving performance through an optimal assignment of architectural parameters to achieve overall execution efficiency. Our goal, in contrast, deals with a design methodology that improves performance of an application that consists of a composition of kernels.…”
Section: Related Workmentioning
confidence: 99%
“…Within the context of GPUs, another research direction involves optimization of shared memory use in GPUs, which are also a form of application-controlled cache. In this area, Baskaran et al have provided an approach for automatically arranging shared memory on NVIDIA GPU by using the polyhedral model for affine loops [4]. Moazeni et al have adapted approaches for register allocation, particularly those based on graph coloring, to manage shared memory on GPU [26].…”
Section: Related Workmentioning
confidence: 99%