L.E.S. Ramos scite author profile

L.E.S. Ramos

5Publications

26Citation Statements Received

157Citation Statements Given

How they've been cited

How they cite others

Affiliations

Universidade Estadual de Campinas, Pontifícia Universidade Católica de Minas Gerais, Rutgers Sexual and Reproductive Health and Rights

Publications

Order By: Most citations

PSkel: A stencil programming framework for CPU‐GPU systems

Pereira

Ramos

Góes

2015

Concurrency and Computation

View full text Add to dashboard Cite

The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like Compute Unified Device Architecture (CUDA) are complex, user unfriendly, and increase the complexity of developing high-performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern CPU-GPU systems. Typically, parallel kernels run entirely on the most powerful device available, leaving other devices idle. These observations sparked research in two directions: (1) high-level approaches to software development for GPUs, which strike a balance between performance and ease of programming; and (2) task partitioning to fully utilize the available devices. In this paper, we propose a framework, called PSkel, that provides a single high-level abstraction for stencil programming on heterogeneous CPU-GPU systems, while allowing the programmer to partition and assign data and computation to both CPU and GPU. Our current implementation uses parallel skeletons to transparently leverage Intel Threading Building Blocks (Intel Corporation, Santa Clara, CA, USA) and NVIDIA CUDA (Nvidia Corporation, Santa Clara, CA, USA). In our experiments, we observed that parallel applications with task partitioning can improve average performance by up to 76% and 28% compared with CPU-only and GPU-only parallel applications, respectively.A common approach to address the CPU-GPU programming complexity is the use of algorithmic skeletons. Parallel skeletons model and abstract common parallel programming patterns (computation and coordination phases), thereby enabling the programmer to focus on algorithm design, rather than on runtime system details. Among existing parallel skeletons, the stencil pattern is critical in many scientific computing domains, including image and signal processing and computational fluid dynamics [3,4]. The large body of recent work targeting GPU implementations of high-performance stencil computations stresses the importance of that pattern [5][6][7][8].Another important aspect of CPU-GPU platforms is that their runtime systems generally fail to exploit the platform's full potential for parallel processing. Specifically, the runtime systems do not partition the work (computations and data) of parallel applications across CPUs and GPUs to increase their utilization. For that reason, many existing frameworks have runtime systems that enable either static or dynamic task partitioning [5,[9][10][11][12][13]. However, those frameworks either fail to provide high-level abstractions, support only multi-GPU systems, or do not partition tasks to both CPU and GPU simultaneously. The aforementioned observations prompt for systems that can both exploit task partitioning efficiently and provide high-level abstractions for CPU-GPU programming.In this paper, we propose and evaluate PSkel (Parallel Skeletons), a framework for stencil programming in heterogeneous CPU-GPU systems. PSkel ...

show abstract

TOAST: Automatic tiling for iterative stencil computations on GPUs

Rocha

Pereira

Ramos

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

Summary The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics processing units (GPUs). In particular, tiling is a technique that can significantly enhance application performance by improving data locality and by reducing the volume of communication between host memory and GPU. In addition, tiling enables stencil applications to process inputs that are larger than the physical GPU memory. However, implementing tiling efficiently is complex, time‐consuming, and error‐prone. In this paper, we propose transparently optimized automatic stencil tiling (TOAST), an automatic tiling mechanism for iterative stencil computations running on GPUs; TOAST has 3 main benefits: (1) It incorporates an optimization model that seeks to maximize data reuse within tiles while respecting the amount of dynamically available GPU memory; (2) it offers a virtualized GPU memory for stencil computations, allowing for large input data; and (3) it performs optimal tiling transparently to the developer of the parallel stencil application. The current implementation of TOAST augments the PSkel framework with an internal solver based on genetic algorithms. Our experimental results show that TOAST improves the performance of iterative stencil applications by up to 13 × compared with their multithreaded (central processing unit–based) optimized versions and up to 48 × compared with a naive tiling approach on GPU. The TOAST mechanism is able to automatically achieve a low percentual overhead of data management compared with actual stencil computation.

show abstract

ClusterSim: a Java-based parallel discrete-event simulation tool for cluster computing

Góes

Ramos²,

Martins³

View full text Add to dashboard Cite

A new learning method of microprocessor architecture

Martins

Corrêa

Góes

et al.

View full text Add to dashboard Cite

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Pereira

Rocha

Ramos

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

L.E.S. Ramos

PSkel: A stencil programming framework for CPU‐GPU systems

TOAST: Automatic tiling for iterative stencil computations on GPUs

ClusterSim: a Java-based parallel discrete-event simulation tool for cluster computing

A new learning method of microprocessor architecture

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Contact Info

Product

Resources

About