There is an increasing demand for a novel computing structure for data-intensive applications such as artificial intelligence and virtual reality. The processing-in-memory (PIM) is a promising alternative to reduce the overhead caused by data movement. Many studies have been conducted on the utilization of the PIM taking advantage of the bandwidth increased by the through silicon via (TSV). One approach is to design an optimized PIM architecture for a specific application, the other is to find the tasks that will be more advantageous when offloading to PIM. The goal of this paper is to make the PIM, a newly introduced technology, be easily applied to various applications. The programmable GPU-based PIM is the target system. The essential but simple task offloading conditions are proposed to secure as many candidate tasks as possible when there is any potential benefit from the PIM. The PIM design options then are explored reflecting the characteristics of the candidate tasks actively. When determining offloading conditions, it is difficult to simultaneously consider three time-energy-power objectives. Thus, the problem is divided into two sub-problems. The first offloading condition is designed based on time-energy constraints, whereas the second offloading condition is modeled to satisfy time-power constraints. During the whole processes, the offloading conditions and the PIM design options are carefully configured in a complementary manner to reduce the tasks that are excluded from the offloading. In the simulation results, the suitability of the modeled two offloading conditions and the proposed PIM design are verified using various benchmarks and then, they are compared with previous works in terms of processing speed and energy. INDEX TERMS High bandwidth memory, near-data-processing, processing-in-memory, task offloading. I. INTRODUCTION Recently, more and more attention is paid to applications that require massive data processing, such as artificial intelligence and virtual reality. The parallelism of the graphic processing unit (GPU) has been used to increase the processing speed for such applications. However, improving efficiency in terms of data movement has not been a big concern [1]. Therefore, there is an increasing demand for a novel computing structure optimized to the execution of data-intensive applications. The near-data-processing (NDP), which puts the processing unit close to the data, is a promising alternative for reducing the overhead caused by data movement. When the processor is near memory, it is defined as a processing-in-memory (PIM). The associate editor coordinating the review of this manuscript and approving it for publication was Cihun-Siyong Gong. An example is the packaging of a processing unit within a memory module or inside a DRAM chip. Recently, the technology of the through silicon via (TSV) enables CPU, GPU or hardware accelerator to be mounted on logic die of 3-dimensional stacked memory. Thanks to this, PIM technology, which vertically stacks DRAM and a logic die with pr...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.