SUMMARYTwo of the main sources of inefficiency in current caches are the non‐uniform distribution of the memory accesses across the cache sets, which causes misses due to the mapping restrictions of non fully associative caches and the access patterns with little locality that degrade the performance of caches under the traditional least recently used. replacement policy. This paper proposes a technique to tackle in a coordinated way both kinds of problems in the context of chip multiprocessors, whose last level caches can be shared by threads with different patterns of locality. Our proposal, called thread‐aware mapping and replacement miss reduction (TAMR2) policy, tracks the behavior of each thread in each set in order to decide the appropriate combination of policies to deal with these problems. Despite its small overhead, TAMR2 achieved in our experiments average power consumption and memory latency reductions of 10% and 12%, respectively, resulting in an average throughput improvement of 5.6%, relative to a traditional cache design using four cores. TAMR2 also outperformed many recent related approaches in the field. Copyright © 2013 John Wiley & Sons, Ltd.