“…Additionally, UM supports GPU memory oversubscription, i.e., GPU kernels access more data than the GPU memory can hold, significantly enhancing programming portability and productivity for memory-demanding workloads. UM technologies have been adopted by HPC frameworks such as Raja [6], Kokkos [9], and Trilinos [16] for writing portable applications on today's and future's major HPC platforms, and by deep learning frameworks [12,22,34]. However, even with active research and improvement by vendors and research community [3,18,23,42], current UM technologies cause significant, or even prohibitive, performance degradation [25,26,46].…”