SUMMARYCurrent computational systems are heterogeneous by nature, featuring a combination of CPUs and graphics processing units (GPUs). As the latter are becoming an established platform for high-performance computing, the focus is shifting towards the seamless programming of these hybrid systems as a whole. The distinct nature of the architectural and execution models in place raises several challenges, as the best hardware configuration is behavior and workload dependent. In this paper, we address the execution of compound, multi-kernel, open computing language computations in multi-CPU/multi-GPU environments. We address how these computations may be efficiently scheduled onto the target hardware, and how the system may adapt itself to changes in the workload to process and to fluctuations in the CPU's load. An experimental evaluation attests the performance gains obtained by the conjoined use of the CPU and GPU devices, when compared with GPU-only executions, and also by the use of data-locality optimizations in CPU environments.