Lock contention aware thread migrations

Pusukuri, Kishore Kumar; Gupta, Rajiv; Bhuyan, Laxmi N.

doi:10.1145/2555243.2555273

Cited by 3 publications

(1 citation statement)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The communication is a common performance bottleneck for fine-grained parallel applications (Yoo et al, 2013b). The authors in (Pusukuri et al, 2014) discuss the techniques used for improving the network performance by reducing lock contention and overlapping communications. In (Jagtap et al, 2012) are analyzed the performance of the ROSS simulation framework on different platforms and the multi-threaded implementation is compared with the MPI-based version.…”

Section: Literature Reviewmentioning

confidence: 99%

SEECSSim: A toolkit for parallel and distributed simulations for mobile devices

Maqbool

Malik

Naqvi

et al. 2019

Journal of Simulation

View full text Add to dashboard Cite

The rapid increase of the computing power on embedded and handheld devices has made these devices attractive for many applications including simulation systems. There are a number of Parallel Discrete Event Simulation (PDES) frameworks that exists but most of these are designed for traditional cluster systems and are not suitable for battery operated devices where energy and power consumption are among the major concerns. A new PDES framework is thus required that takes into account the typical constraints of the mobile devices. However, before designing a new PDES framework that is specifically aimed for mobile devices, it is helpful to analyze the performance of existing frameworks. In this paper, the well-known Rensselaer's Optimistic Simulation System (ROSS) framework has been instrumented for a detailed analysis of its performance in terms of CPU usage, memory consumption, and energy and power requirements. This profiling helps in many ways. For example, we can select the most appropriate synchronizations algorithm for running the PDES frameworks on the mobile devices. Additionally, identification of resource intensive modules within the framework can be extremely useful in redesign/optimization of these frameworks while being ported to the heterogeneous environments. Based on these observations, we propose a new simulation framework that is specifically designed for running on handheld devices. The simulation framework, that is called SEECSSim 1 , is the first one designed keeping in mind the characteristics and the constraints that are typical of mobile devices. SEECSSim includes the support for a number of state-of-the-art synchronization protocols and, thanks to its flexible design, the users can easily integrate any other simulation model/synchronization algorithm of their choice. The proposed framework dynamically manages simulation on devices and also perform process migration to optimize the use of resources. The performance of SEECSSim has been studied using a well-known simulation model (i.e. PHOLD) for different synchronization algorithms.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

SEECSSim: A toolkit for parallel and distributed simulations for mobile devices

Maqbool

Malik

Naqvi

et al. 2019

Journal of Simulation

View full text Add to dashboard Cite

show abstract

Jumbler: A lock-contention aware thread scheduler for multi-core parallel machines

Nisar

Aleem

Iqbal

et al. 2017

2017 International Conference on Recent Advances in Signal Processing, Telecommunications &Amp; Computing (SigTelCom)

View full text Add to dashboard Cite

Fast and Portable Locking for Multicore Architectures

Lozi

David

Thomas

et al. 2016

ACM Trans. Comput. Syst.

View full text Add to dashboard Cite

The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server’s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an ×86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.

show abstract

Lock contention aware thread migrations

Cited by 3 publications

References 2 publications

SEECSSim: A toolkit for parallel and distributed simulations for mobile devices

SEECSSim: A toolkit for parallel and distributed simulations for mobile devices

Jumbler: A lock-contention aware thread scheduler for multi-core parallel machines

Fast and Portable Locking for Multicore Architectures

Contact Info

Product

Resources

About