“…As a result, inevitable memory bottleneck problem drives both industry and academia to reassess the DRAM-based near-memory-processing (NMP) [7], [12], [19], [25], [50] and processing-in-memory (PIM) [13], [14], [16], [21], [22], [30], [31], [32], [34], [38], [48], [49] architectures that increase the internal bandwidth by integrating computational logic and DRAM device/cells closely. NMP architectures integrate homogeneous processing unit (PU) per vault in the base logic die of hybrid memory cube (HMC), supporting flexible dataflow for query operations.…”