Data movement between processing and memory is the root cause of the limited performance and energy efficiency in modern von Neumann systems. To overcome the data-movement bottleneck, we present the memristive Memory Processing Unit (mMPU)-a real processing-in-memory system in which the computation is done directly in the memory cells, thus eliminating the necessity for data transfer. Furthermore, with its enormous inner parallelism, this system is ideal for data-intensive applications that are based on single instruction, multiple data (SIMD)-providing high throughput and energy-efficiency. Modern computers are typically based on the von Neumann architecture, in which the memory is separated from the processing space and programs are executed by moving data between the processing and memory units. This incessant data movement is the lead cause of the performance bottleneck known as the memory wall, which has increased in severity over the years as CPU speed improvements have surpassed those of memory speed and bandwidth. Furthermore, with the demise of Dennard scaling, energy-efficiency is becoming a major concern in modern computers; for example, moving data to an off-chip DRAM consumes four orders of magnitude more energy than the computation itself. 1 One approach to addressing the challenges arising from data movement is to move the computation closer to the memory. Both DRAM and emerging non-volatile memory technologies can provide ample intrinsic parallelism, which goes unutilized today due to pin-limited integrated circuit interfaces. Processing in memory (PIM) can leverage this intrinsic parallelism by avoiding the need for high-latency and high-energy chip-to-chip data transfers, thus yielding massively parallel, high-performance, energy-efficient processing systems.