This paper introduces SIMinG-1k-a manycore simulator infrastructure. SIMinG-1k is a graphics processing unit accelerated, parallel simulator for design-space exploration of large-scale manycore systems. It features an optimal trade-off between modeling accuracy and simulation speed. Its main objectives are high performance, flexibility, and ability to simulate thousands of cores. SIMinG-1k can model different architectures (currently, we support ARM (Available from: http://infocenter.arm.com/help/index.jsp?topic=/com. arm.doc.ddi0100i/index.html) and Intel x86) using two-step approac where architecture specific front end is decoupled from a fast and parallel manycore virtual machine running on graphical processing unit platform. We evaluate the simulator for target architecture with up to 4096 cores. Our results demonstrate very high scalability and almost linear speedup with simulation of increasing number of cores. S. RAGHAV ET AL. computing domain, from High Performance Computing (HPC) to embedded systems. Examples of similar architectures may include on-chip manycore accelerators such as the Hypercore Architecture Line from Plurality [1], Platform 2012 [2], or future evolutions of Intel's prototypes Larrabee [3] and Single-Chip Cloud Computer [4].Dark silicon pushes innovations towards specialization where a single chip will include a spectrum of hardware accelerators to access and manipulate data in the cloud workloads with minimal energy.Simulation and virtual prototyping technology must obviously evolve to tackle the numerous challenges inherent in simulating such highly parallel architectures. Current state-of-the-art sequential simulators use SystemC [5], binary translation, smart sampling techniques, or tuneable abstraction levels for hardware description. These kinds of simulation technologies typically have to make a trade-off between simulation accuracy and simulation speed. Because very low-level hardware operations are accurately modeled, simulation is slow. This can lead to unacceptable performance when simulating a huge number of cores. Simulating a parallel system is an inherently parallel task. Individual processor simulation may independently proceed until the point where communication or synchronization with other processors is required. This is the key idea behind parallel simulation technology that distributes the simulation workload over parallel hardware resources. Parallel simulators utilizes the availability of multiple physical processing nodes to increase the simulation rate. However, this requirement may turn out to be much too costly in case of adopting server clusters or computing farms as a host for running simulations. The high cost-in terms of increasing latency and decreasing bandwidth-typically leads to poor scalability because of the synchronization overhead when increasing the number of processing nodes.The development of computer technology has recently led to an unprecedented performance increase of general-purpose graphical processing units (GPGPU). Modern GPGPUs integrat...