Nikolaos D. Kallimanis scite author profile

Fine-grain thread synchronization has been proved, in several cases, to be outperformed by efficient implementations of the combining technique where a single thread, called the combiner , holding a coarse-grain lock, serves, in addition to its own synchronization request, active requests announced by other threads while they are waiting by performing some form of spinning. Efficient implementations of this technique significantly reduce the cost of synchronization, so in many cases they exhibit much better performance than the most efficient finely synchronized algorithms. In this paper, we revisit the combining technique with the goal to discover where its real performance power resides and whether or how ensuring some desired properties (e.g., fairness in serving requests) would impact performance. We do so by presenting two new implementations of this technique; the first (CC-Synch) addresses systems that support coherent caches, whereas the second (DSM-Synch) works better in cacheless NUMA machines. In comparison to previous such implementations, the new implementations (1) provide bounds on the number of remote memory references (RMRs) that they perform, (2) support a stronger notion of fairness, and (3) use simpler and less basic primitives than previous approaches. In all our experiments, the new implementations outperform by far all previous state-of-the-art combining-based and fine-grain synchronization algorithms. Our experimental analysis sheds light to the questions we aimed to answer. Several modern multi-core systems organize the cores into clusters and provide fast communication within the same cluster and much slower communication across clusters. We present an hierarchical version of CC-Synch, called H-Synch which exploits the hierarchical communication nature of such systems to achieve better performance. Experiments show that H-Synch significantly outper forms previous state-of-the-art hierarchical approaches. We provide new implementations of common shared data structures (like stacks and queues) based on CC-Synch, DSM-Synch and H-Synch. Our experiments show that these implementations outperform by far all previous (fine-grain or combined-based) implementations of shared stacks and queues.

show abstract

A highly-efficient wait-free universal construction

Fatourou

Kallimanis

2011

105

113

View full text Add to dashboard Cite

We present a new simple wait-free universal construction, called Sim, that uses just a Fetch&Add and an LL/SC object and performs a constant number of shared memory accesses. We have implemented Sim in a real shared-memory machine. In theory terms, our practical version of Sim, called P-Sim, has worse complexity than its theoretical analog; in practice though, we experimentally show that P-Sim outperforms several state-of-the-art lock-based and lock-free techniques, and this given that it is wait-free, i.e., that it satisfies a stronger progress condition than all the algorithms it outperforms.We have used P-Sim to get highly-efficient wait-free implementations of stacks and queues. Our experiments show that our implementations outperform the currently stateof-the-art shared stack and queue implementations which ensure only weaker progress properties than wait-freedom.

show abstract

Highly-Efficient Wait-Free Synchronization

Fatourou

Kallimanis

2013

Theory Comput Syst

View full text Add to dashboard Cite

The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems

Katevenis

Chrysos

Marazakis

et al. 2016

View full text Add to dashboard Cite

ExaNest is one of three European projects that support a ground-breaking computing architecture for exascale-class systems built upon power-efficient 64-bit ARM processors. This group of projects share an 'everything-close' and 'share-anything' paradigm, which trims down the power consumption - by shortening the distance of signals for most data transfers - as well as the cost and footprint area of the installation - by reducing the number of devices needed to meet performance targets. In ExaNeSt, we will design and implement: (i) a physical rack prototype and its liquid-cooling subsystem providing ultra-dense compute packaging, (ii) a storage architecture with distributed (in-node) non-volatile memory (NVM) devices, (iii) a unified, low-latency interconnect, designed to efficiently uphold desired Quality-of-Service guarantees for a mix of storage with inter-processor flows, and (iv) efficient rack-level memory sharing, where each page is cacheable at only a single node . Our target is to test alternative storage and interconnect options on actual hardware, using real-world HPC applications. The ExaNeSt consortium brings together technology, skills, and knowledge across the entire value chain, from computing IP, packaging, and system deployment, all the way up to operating systems, storage, HPC, big data frameworks, and cutting-edge applications

show abstract

The RedBlue Adaptive Universal Constructions

Fatourou

Kallimanis

2009

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.