“…As a result, PuM architectures can provide high compute throughput by performing operations in a bulk parallel manner, often at the granularity of memory rows. Prior PuM works [70,72,74,75,79,82,84,96,97] propose mechanisms for the execution of bulk bitwise operations (e.g., bitwise MAJority,AND,OR,NOT) [72, 74, 78, 80, 82-85, 87, 91, 98] and bulk arithmetic operations [70,75,79,96,97]. However, these proposals have two important limitations: 1) the execution of some complex operations (e.g., multiplication, division) incurs high latency and energy consumption [75], and 2) other complex operations (e.g., exponentiation, trigonometric functions) are not even supported.…”