The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like Compute Unified Device Architecture (CUDA) are complex, user unfriendly, and increase the complexity of developing high-performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern CPU-GPU systems. Typically, parallel kernels run entirely on the most powerful device available, leaving other devices idle. These observations sparked research in two directions: (1) high-level approaches to software development for GPUs, which strike a balance between performance and ease of programming; and (2) task partitioning to fully utilize the available devices. In this paper, we propose a framework, called PSkel, that provides a single high-level abstraction for stencil programming on heterogeneous CPU-GPU systems, while allowing the programmer to partition and assign data and computation to both CPU and GPU. Our current implementation uses parallel skeletons to transparently leverage Intel Threading Building Blocks (Intel Corporation, Santa Clara, CA, USA) and NVIDIA CUDA (Nvidia Corporation, Santa Clara, CA, USA). In our experiments, we observed that parallel applications with task partitioning can improve average performance by up to 76% and 28% compared with CPU-only and GPU-only parallel applications, respectively.A common approach to address the CPU-GPU programming complexity is the use of algorithmic skeletons. Parallel skeletons model and abstract common parallel programming patterns (computation and coordination phases), thereby enabling the programmer to focus on algorithm design, rather than on runtime system details. Among existing parallel skeletons, the stencil pattern is critical in many scientific computing domains, including image and signal processing and computational fluid dynamics [3,4]. The large body of recent work targeting GPU implementations of high-performance stencil computations stresses the importance of that pattern [5][6][7][8].Another important aspect of CPU-GPU platforms is that their runtime systems generally fail to exploit the platform's full potential for parallel processing. Specifically, the runtime systems do not partition the work (computations and data) of parallel applications across CPUs and GPUs to increase their utilization. For that reason, many existing frameworks have runtime systems that enable either static or dynamic task partitioning [5,[9][10][11][12][13]. However, those frameworks either fail to provide high-level abstractions, support only multi-GPU systems, or do not partition tasks to both CPU and GPU simultaneously. The aforementioned observations prompt for systems that can both exploit task partitioning efficiently and provide high-level abstractions for CPU-GPU programming.In this paper, we propose and evaluate PSkel (Parallel Skeletons), a framework for stencil programming in heterogeneous CPU-GPU systems. PSkel ...