The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation.In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.computing has been widely recognized as an important optimization strategy. In addition, accelerating coprocessors that better support code branching established on the market.Since not all tasks can benefit from such specialized devices, developers need to distribute work on the various architectural elements. Managing such a heterogeneous runtime environment inherently increases the complexity. While some loop-based computations can be offloaded to GPUs using OpenACC [10] or recent versions of OpenMP [15] with relatively little programming effort, it has been shown that a consistent task-oriented design exploits the available parallelism more efficiently. Corresponding results achieve better performance [27] while they are also applicable to more complex work loads. However, manually orchestrating tasks between multiple devices is an error-prone and complex task.The actor model of computation describes applications in terms of isolated software entities-actorsthat communicate by asynchronous message passing. Actors can be distributed across any number of processors or machines by the runtime system as they are not allowed to share state and thus can always be executed in parallel. The message-based decoupling of software entities further enables actors to run on different devices in a heterogeneous environment. Hence, the actor model can simplify software development by hiding the complexity of heterogeneous and distributed deployments.In this work, we take up our previous contribution [22] about actors programmed with OpenCLthe Open Computing Language standardized by the Khronos Group [41]. We enhance the integration of heterogeneous programming with the C++ ...