Chip multiprocessor (CMP) systems have made the on-chip caches a critical resource shared among co-scheduled threads. Limited off-chip bandwidth, increasing on-chip wire delay, destructive inter-thread interference, and diverse workload characteristics pose key design challenges. To address these challenge, we propose CMP cooperative caching (CC), a unified framework to efficiently organize and manage on-chip cache resources. By forming a globally managed, shared cache using cooperative private caches. CC can effectively support two important caching applications: (1) reduction of average memory access latency and (2) isolation of destructive inter-thread interference.CC reduces the average memory access latency by balancing between cache latency and capacity optimizations. Based private caches, CC naturally exploits their access latency benefits. To improve the effective cache capacity, CC forms a "shared" cache using replication control and LRU-based global replacement policies. Via cooperation throttling, CC provides a spectrum of caching behaviors between the two extremes of private and shared caches, thus enabling dynamic adaptation to suit workload requirements. We show that CC can achieve a robust performance advantage over private and shared cache schemes across different processor, cache and memory configurations, and a wide selection of multithreaded and multiprogrammed workloads.To isolate inter-thread caching interference, we add a time-sharing aspect on top of spatial cache partitioning. Our approach uses Multiple Time-sharing Partitions (MTP) to simultaneously improve throughput and fairness while maintaining QoS over the longer term. Each MTP partition unfairly improves at least one thread's throughput, and partitions favoring different threads are scheduled in a cooperative, timesharing manner to either maintain fairness and QoS, or implement priority. We also integrate MTP with CC's LRU-based capacity sharing policy to combine their benefits. The integrated scheme-Cooperative Caching Partitioning (CCP)-divides the total execution epochs into those controlled by either MTP or the ii baseline CC policy, respectively, according to the fraction of threads that can benefit from each of them. Our simulation results show that for a wide range of multiprogrammed workloads, CCP can improve throughput, fairness and QoS for workloads suffering from destructive interference, while achieving the performance benefit of the baseline CC policy for other workloads.iii