“…As a fundamental primitive in many important application domains such as graph analytics, machine learning, and scientific computation [3,8,17,20,22,38,41,45,47,55,58], Sparse Basic Linear Algebra Subprograms (SpBLAS) are notoriously memory intensive due to the irregular memory access pattern. Recently, there has been a surge in customizing hardware accelerators near memory to tackle sparse BLAS applications such as sparse gathering [2,24,30], sparse matrix vector multiplication (SpMV) [2,42,52], and graph analytics [1,12,36,57,60].…”