“…In order to handle ever-growing data sizes in these applications beyond the relatively limited capacity (tens of GBs) of GPU onboard memory, the use of external memory such as the host DRAM and solid-state drives (SSDs) can be a cost-effective approach compared with pooling multiple GPUs' memory together [9-11, 18, 22, 28, 31, 33, 37, 39, 40, 43]. In particular, GPU-centric external memory access methods have been shown to yield the stateof-the-art runtime performance in workloads involving on-demand, fine-grained random access such as graph analytics [31,33]. That is, when small pieces of data to be read next depend on the current processing results and cannot be a priori determined, it is more efficient to have the GPU initiate data requests than to have the CPU control the data flow between the GPU and external memory.…”