The performance of many applications is limited by the available memory bandwidth. One approach to improve the performance of such memory-bound applications is to move the computation closer to the required data. Processing In Memory (PIM) integrates computational units directly with the memory. To enable PIM technology in widely used programming models, we propose extensions to OpenMP and OpenACC, two examples of directive-based programming models, as well as SYCL. The extensions are designed to be portable across many existing and future parallel computing devices and platforms, making PIM technology widely available.For the extensions, we propose an end-to-end compilation framework based on several steps of abstraction and progressive lowering. To achieve this goal, we formulate a new PIM IR and conduct optimizations tailored to hardware characteristics. By using AMD MI100 GPU with PIM-enabled HBM2 memory, we observe a performance improvement of 1.2-2.1 times for representative examples and a real high-performance computing (HPC) application compared to the same GPU without PIMenabled memory.