Compact NUMA-aware Locks

Dice, Dave; Kogan, Alex

doi:10.1145/3302424.3303984

Cited by 34 publications

(26 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meanwhile, hierarchical locks [16,22] use batching to minimize the issue of cache-line bouncing in today's machines. CNA [21] and ShflLock [33] address the problem of hierarchical locks: memory overhead and degraded performance for a smaller number of cores. ShflLock presents a new ideology of designing lock algorithms by decoupling lock-policy from implementation.…”

Section: Locks: Past Present and Future?mentioning

confidence: 99%

Contextual concurrency control

Park¹,

Calciu

Kim³

et al. 2021

Proceedings of the Workshop on Hot Topics in Operating Systems

View full text Add to dashboard Cite

Kernel synchronization primitives are of paramount importance to achieving good performance and scalability for applications. However, they are usually invisible and out of the reach of application developers. Instead, kernel developers and synchronization experts make all the decisions regarding kernel lock design.In this paper, we propose contextual concurrency control (C 3 ), a new paradigm that enables userspace applications to tune concurrency control in the kernel. C 3 allows developers to change the behavior and parameters of kernel locks, to switch between different lock implementations and to dynamically profile one or multiple locks for a specific scenario of interest.To showcase this idea, we designed and implemented Concord, a framework that allows a privileged userspace process to modify kernel locks on the fly without re-compiling the existing code base. We performed a preliminary evaluation on two locks showing that Concord allows userspace tuning of kernel locks without incurring significant overhead. CCS CONCEPTS• Computer systems organization → Multicore architectures; • Software and its engineering → Mutual exclusion; Concurrency control; Scheduling.

show abstract

Section: Locks: Past Present and Future?mentioning

confidence: 99%

Contextual concurrency control

Park¹,

Calciu

Kim³

et al. 2021

Proceedings of the Workshop on Hot Topics in Operating Systems

View full text Add to dashboard Cite

show abstract

“…On the other hand, the memory requirement of the local lock is proportional to the number of NUMA nodes, which is prohibitively expensive in certain environments such as database systems or an operating system kernel. CNA 11 is a variant of MCS lock augmented with NUMA‐awareness, which addresses the memory issue caused by the above hierarchical locks while sustaining the hierarchical performance. Unlike MCS, it passes the lock to a successor running on the same NUMA node as the lock holder by moving waiting threads running on different NUMA nodes to a separate queue.…”

Section: Related Workmentioning

confidence: 99%

“…However, on NUMA machines, the execution for critical sections with less scaled locking schemes can easily cause performance collapse under high locking contention. As a result, there are many researches focusing on improving the scalability performance of locks, 1,4‐23 for which the most efficient alternatives are hierarchical NUMA‐aware locks and delegation locks on NUMA multicore systems. Hierarchical locks consist of local locks each synchronizing threads running on the same NUMA node and a global lock synchronizing threads holding a local lock.…”

Section: Introductionmentioning

confidence: 99%

A scalable lock on NUMA multicore

Yao

2020

Concurrency and Computation

View full text Add to dashboard Cite

Modern NUMA multicore architectures exhibit complicated memory behavior, such as cache coherence invalidation and nonuniform memory access where the access from a core to its local memory is significantly faster than crossnode access to memory on a different NUMA node. The complicated memory behavior has a large impact on the efficiency of locking synchronization, which affects the performance of parallel applications. Prior works offer several efficient designs to improve locking performance such as delegation schemes. However, the existing delegation schemes either occupy computing cores or provide nonscalable performance, or offer less portability. In this work, we present a NUMA-aware delegation lock that occupies no cores while offering scalable performance under high contention for NUMA multicore machines. The new lock is a variant of an efficient FFWD lock, and inherits its performance features, such as buffering responses within a NUMA node to minimize cache coherence traffic. Unlike FFWD, the new lock employs hierarchical NUMA-aware memory allocation and NUMA-aware dynamic server thread technique, to reduce crossnode communication between client and server threads. Our evaluation shows that the new lock outperforms FFWD under high contention, achieving the significant performance gains when compared with other state-of-the-art locks.

show abstract

“…To determine the critical section length that does not prolong the handover delay, we measured the lock performance using the locktorture [27], [28] test kernel module by varying the length of the critical section and the number of concurrent threads. The configuration of the many-core server used for this experiment was described in Table 4.…”

Section: B Exit Latency Optimizationmentioning

confidence: 99%

Catnap: A Backoff Scheme for Kernel Spinlocks in Many-Core Systems

Woo

Kim

et al. 2020

IEEE Access

View full text Add to dashboard Cite

As the number of cores equipped in a system grows, the impact of the spinlock waiting inside the operating system (OS) kernel on the performance and energy efficiency of a system worsens. In particular, it deteriorates the effectiveness of simultaneous multithreading (SMT). Because spinlocks are indispensable in OS kernels, it is necessary to suppress the spin wait overhead in the many-core systems. To address this issue, we propose the catnap spinlock that exploits the ACPI-C state, which is named as the catnap state and is induced by the MONITOR/MWAIT instruction pair. The catnap state releases the processor resources while deceiving the kernel that the thread is iterating a busy-waiting loop. Because entering and exiting from the C-state require considerable time, we applied the catnap loop only to the non-head waiters not to delay the lock handover operation. Furthermore, we selectively applied the catnap spinlock to the lock instances for sufficiently long critical sections based on the observation made in profiling runs. The proposed scheme was implemented in the Linux kernel and evaluated in a many-core processor system with a few workloads from the PARSEC and Re-aim benchmark suites. Our evaluation showed that the proposed scheme improved the performance by up to 33.59% and reduced energy consumption by 39.11%.

show abstract

Compact NUMA-aware Locks

Cited by 34 publications

References 18 publications

Contextual concurrency control

Contextual concurrency control

A scalable lock on NUMA multicore

Catnap: A Backoff Scheme for Kernel Spinlocks in Many-Core Systems

Contact Info

Product

Resources

About