In large scale chip multicore, last level cache management and core interconnection network play important roles in performance and power consumption. And in large scale chip multicore, mesh interconnect is used widely due to scalability and simplicity of design. As interconnection network occupied significant area and consumes significant percent of system power, bufferless network is an appealing alternative design to reduce power consumption and hardware cost. We have designed and implemented a simulator for simulation of distributed cache management of large chip multicore where cores are connected using bufferless interconnection network. Also, we have redesigned and implemented the DDGSim, which is a GPU compatible parallel version of the same simulator using CUDA programming model. We have simulated target large chip multicore with up to 43,000 cores and achieved up to 25 times speedup on NVIDIA GeForce GTX 690 GPU over serial simulation.
I. INTRODUCTIONIn large scale chip multicore (LCMP), on-chip cache management and interconnection network have significant impact on performance, power consumption of the system. As the core count of chip multicore increase, the pressure on on-chip cache (in particularly the last level cache (LLC) L2 cache) increase significantly. Single shared cache (physically shared) is not good for performance in terms of access latency and interference among cores. Any how, there are many level of caches in this kind of system, first level cache (L0 an L1) must be private, but the last level cache (the L2 cache) which must be bigger and need to be managed efficiently. Also the completely distributed (physically distributed) cache may not be good for many cases where a core requires a larger portion of cache. Distributed cache suffers from increased local cache pressure and eviction. So logically shared and physically distributed model (LSPD) capture both performance in terms of access time and share effectively. Among various last level cache management models, LSPD model of last level cache is promising in terms of cache utilization and overall system performance. The performance of LSPD model depends on effective policy for the cache block placement, eviction, migration and directory management.Mesh interconnection network to connect the cores in LCMP is widely used as it provides a good trade off between simplicity, scalability and maintainability. As stated in [1-3], the interconnection network in LCMP occupies significant amount of area and consumes around 40% of total power, so bufferless network is promising alternative design to reduce hardware cost and power consumption where overall network traffic is low to medium range.It is good idea to explore all the design spaces of different cache management policies, which suite to large multicore system connected using bufferless interconnection network. So in this work, we have designed an efficient simulator to simulate on chip cache management of LCMP where cores are connected using bufferless network. As most of available simul...