“…Over the past decade, a lot of research has focused on designing efficient algorithms to solve a range of classical problems on GPUs [9,12,17,27,36,37,39]. These works have introduced several optimization techniques, such as coalesced memory accesses [10,11,35], branch divergence elimination [19,23], and bank conflict avoidance [1,6,9,19]. Several empirical models for specific GPUs have been proposed that use micro-benchmarking [5,41], and several fast GPU algorithms have been produced [10,17,39] via the use of benchmarks [40] and application of hardware-specific optimization techniques to existing algorithms.…”