Andrey V. Tabakov scite author profile

Parallel computing is one of the top priorities in computer science. The main means of parallel processing information is a distributed computing system (CS) - a composition of elementary machines that interact through a communication medium. Modern distributed VSs implement thread-level parallelism (TLP) within a single computing node (multi-core CS with shared memory), as well as process-level parallelism (PLP) process-level parallelism for the entire distributed CS. The main tool for developing parallel programs for such systems is the MPI standard. The need to create scalable parallel programs that effectively use compute nodes with shared memory has determined the development of the MPI standard, which today supports the creation of hybrid multi-threaded MPI programs. A hybrid multi-threaded MPI program is the combination of the computational capabilities of processes and threads. The standard defines four types of multithreading: Single - one thread of execution; Funneled - a multi-threaded program, but only main thread can perform MPI operations; Serialized - only one thread at the exact same time can make a call to MPI functions; Multiple - each program flow can perform MPI functions at any time. The main task of the multiple mode is the need to synchronize the communication flows within each process. This paper presents an overview of the work that addresses the problem of synchronizing processes running on remote machines and synchronizing internal program threads. Method for synchronization of threads based on queues with weakened semantics of operations is proposed.

show abstract

Evaluating the performance of atomic operations on modern multicore systems

Goncharenko

Paznikov

Tabakov

2019

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

In this work we analyse the efficiency of atomic operations compare-and-swap (CAS), fetch-and-add (FAA), swap (SWP), load and store on modern multicore processors. These operations implemented in hardware as processor instructions are highly demanded in multithreaded programming (design of thread locks and non-blocking data structures). In this article we study the influence of cache coherence protocol, size and locality of the data on the latency of the operations. We developed a benchmark for analyzing the dependencies of throughput and latency on these parameters. We present the results of the evaluation of the efficiency of atomic operations on modern x86-64 processors and give recommendations for the optimizations. Particularly we found atomic operations, which have minimum (load), maximum (“successful CAS”, store) and comparable (“unsuccessful CAS”, FAA, SWP) latency. We showed that the choice of a processor core to perform the operation and the state of cache-line impact on the latency at average 1.5 and 1.3 times respectively. The suboptimal choice of the parameters may increase the throughput of atomic operations from 1.1 to 7.2 times. Our evidences may be used in the design of new and optimization of existing concurrent data structures and synchronization primitives.

show abstract

Modelling of Parallel Threads Synchronization in Hybrid MPI + Threads Programs

Tabakov

Paznikov

2019

View full text Add to dashboard Cite

Performance Modeling of Atomic Operations in Control Systems Based on Multicore Computer Systems

Goncharenko

Paznikov

Tabakov

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andrey V. Tabakov

Algorithms for Optimization of Relaxed Concurrent Priority Queues in Multicore Systems

Using relaxed concurrent data structures for contention minimization in multithreaded MPI programs

Evaluating the performance of atomic operations on modern multicore systems

Modelling of Parallel Threads Synchronization in Hybrid MPI + Threads Programs

Performance Modeling of Atomic Operations in Control Systems Based on Multicore Computer Systems

Contact Info

Product

Resources

About