Abstract-We introduce an architectural approach to improve memory system performance in both uniprocessor and multiprocessor systems. The architectural innovation is a flexible active memory controller backed by specialized cache coherence protocols that permit the transparent use of address remapping techniques. The resulting system shows significant performance improvement across a spectrum of machine configurations, from uniprocessors through single-node multiprocessors (SMPs) to distributed shared memory clusters (DSMs). Address remapping techniques exploit the data access patterns of applications to enhance their cache performance. However, they create coherence problems since the processor is allowed to refer to the same data via more than one address. While most active memory implementations require cache flushes, we present a new approach to solve the coherence problem. We leverage and extend the cache coherence protocol so that our techniques work transparently to efficiently support uniprocessor, SMP and DSM active memory systems. We detail the coherence protocol extensions to support our active memory techniques and present simulation results that show uniprocessor speedup from 1.3 to 7.6 on a range of applications and microbenchmarks. We also show remarkable performance improvement on small to medium-scale SMP and DSM multiprocessors, allowing some parallel applications to continue to scale long after their performance levels off on normal systems.Index Terms-Active memory systems, address remapping, cache coherence protocol, distributed shared memory, flexible memory controller architecture.