During the past decade, high-performance computing evolved toward multi-core and many-core architectures. General-purpose processors feature now dozens of coarse-grain (complex) cores each with four to eight SIMD lanes for parallel computation and multi-channel memory buses for high bandwidth. Hardware accelerators such as graphics processing units (GPUs) have also a two-stage design with multiple coarse units that contain an even higher number of SIMD lanes and wider memory buses for higher bandwidth. Therefore, the adoption of hardware accelerators is rapidly advancing in performance sensitive areas. They are particularly relevant in high-throughput disciplines such as high-quality 3D computer graphics and vision, real-time data stream processing, and high-performance scientific computing. The main reason behind this trend is that these accelerators can potentially yield speedups and energy savings orders of magnitude higher than those obtained with optimized implementations for general-purpose CPU cores. A clear indicator of this trend is the prevalence of these accelerators in the supercomputing systems in the top positions of both the TOP500 and Green500 lists. As a result, during the past few years, these architectures have become powerful, capable, and inexpensive mainstream coprocessors, useful for a wide variety of applications. Furthermore, they are nowadays present in a large variety of machines, ranging from low-end single user-platforms to supercomputers.However, the benefits of heterogeneous systems do not come 'for free': scientists using these platforms have to deal not only with multiple parallelism levels, but also with the programmability differences of available accelerators. To address these challenges, we observe the development of a very rich environment for their programming, particularly in comparison with the restricted landscape of only a few years ago. A key criterion to characterize the new high-level programming tools and libraries for these devices is their positioning within the triangle of performance, coding comfort and specialization. The spectrum ranges from high-performance building blocks for common numeric or discrete transformations, to domain-specific libraries that facilitate the solution of a certain class of problems, and to general high-level abstractions targeted toward increasing programmers' productivity.In summary, the advances both in the hardware and in the programmability of accelerators, coupled with their potentially appealing performance/power ratio for a wide range of applications, have pushed organizations to invest in heterogeneous systems that include accelerators and have motivated researchers to port their algorithms to such systems and develop novel tools to facilitate their usage.This special issue contributes to this important field with extended and carefully reviewed versions of selected papers from two workshops, namely the 3rd Minisymposium on GPU Computing, which was held as