Busy-wait techniques are heavily used for mutual exclusion and barrier synchroni?;ation in shared-memory parallel programs. Cnfortunatcly, typical implementations of busy-waiting tend to produce large aIIlounts of memory and interconnect contention, introducing perf(H"-IIl<-1IlCe bottlenecks that become markedly more pronounced <-1,, -: applications scale. \:Ve argue in this paper that this problem is not fundamental, and tha.t one can in fad construct bUi:>Y-wait synchroni7';ation algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on a separate location in local memory, and for some other processor to terminate the spin \vith a single remote \vrite operation at an appropriate time. Locations on \vhich to spin may be local as a result of coherent caching, or by virtue of static allocation in the local portion of phYi:>ically dii:>tributed i:>hared memory.'Ve present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides rea .. .;;onable latency in the absence of contention, requires only a constant amount of i:>pace per lock, and requirei:> no harchvare support other than a i:>\vap-with-memory instruction. 'ile ali:>o prei:>ent a nev, ,! i:>calable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previouslyknmvn barriers can likewise be cast in a form that spins only on local locations. Kone of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and \vrites.'ile compare the performance of our scalable algorithms with other software approaches to busy-wait synchroniz;ation on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to s;qnchronization need not be a problem in large-scale 8han;rl-meTrwTY mnltiprnce88oT8. The existence of scalable algorithms greatly weakens the ca.':>e for costly special-purpose harchvare i:>upport for i:>ynchronization, and provides a case against so-called "dance hall" architecturei:>, in which shared memory locations are equally far from all processors.
The obstruction-free Dynamic Software Transactional Memory (DSTM) system of Herlihy et al. allows only one transaction at a time to acquire an object for writing. Should a second require an object currently in use, a contention manager must determine which may proceed and which must wait or abort.We analyze both new and existing policies for this contention management problem, using experimental results from a 16-processor SunFire machine. We consider both visible and invisible versions of read access, and benchmarks that vary in complexity, level of contention, tendency toward circular dependence, and mix of reads and writes. We present fair proportional-share prioritized versions of several policies, and identify a candidate default policy: one that provides, for the first time, good performance in every case we test. The tradeoff between visible and invisible reads remains application-specific: visible reads reduce the overhead for incremental validation when opening new objects, but the requisite bookkeeping exacerbates contention for the memory interconnect.
This paper provides a theoretical and practical framework for crash-resilient data structures on a machine with persistent (nonvolatile) memory but transient registers and cache. In contrast to certain prior work, but in keeping with "real world" systems, we assume a full-system failure model, in which all transient state (of all processes) is lost on a crash. We introduce the notion of durable linearizability to govern the safety of concurrent objects under this failure model and a corresponding relaxed, buffered variant which ensures that the persistent state in the event of a crash is consistent but not necessarily up to date.At the implementation level, we present a new "memory persistency model," explicit epoch persistency, that builds upon and generalizes prior work. Our model captures both hardware buffering and fully relaxed consistency, and subsumes both existing and proposed instruction set architectures. Using the persistency model, we present an automated transform to convert any linearizable, nonblocking concurrent object into one that is also durably linearizable. We also present a design pattern, analogous to linearization points, for the construction of other, more optimized objects. Finally, we discuss generic optimizations that may improve performance while preserving both safety and liveness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.