Transparent adaptation of sharing granularity in MultiView‐based DSM systems

Proceedings of the 18th Annual International Conference on Supercomputing

2004

Self Cite

A Distributed Shared Memory (DSM) system provides a distributed application with a shared virtual address space. Choosing a memory consistency model is one of the main decisions in designing a DSM system. While Sequential Consistency provides a simple and intuitive programming model, relaxed consistency models allow memory accesses to be parallelized, improving runtime performance. We implement the home-based lazy release consistency (HLRC) protocol that supports preemptive multithreading and compare its performance with the efficient multithreaded SC protocol. We perform an "apple-to-apple" comparison on the same testbed environment and benchmark suite, and investigate the effectiveness and scalability of both these protocols.

Section: Performance Analysis Of Barnes-spmentioning

confidence: 87%

Section: Performance Evaluationmentioning

confidence: 97%

Section: An Efficient Implementation Of Scmentioning

confidence: 99%

See 1 more Smart Citation

A comparison of sequential consistency with home-based lazy release consistency for software distributed shared memory

Iosevich

Proceedings of the 18th Annual International Conference on Supercomputing

2004

Self Cite

“…ISR event handling reduces the response time for asynchronous messages by 33 percent relative to user-level signal handlers, and our memory primitives outperform the corresponding system calls for changing the protection of page groups by an order of magnitude. While the full benefits of our memory services were not realized in our protocol (only single-page groups were used), we expect them to substantially improve the performance of DSM protocols that require multiple instantaneous pageprotection changes (e.g., RC protocols and adaptivegranularity SC protocols [22]). We have shown how a high-level protocol can be split between interrupt and process contexts without introducing harmful data races or compromising other OS activity.…”

Section: Dsm Conclusion and Opportunitiesmentioning

confidence: 99%

“…Our application suite comprises eight applications: Waternsquared (Water), LU-contiguous (LU), and Barnes-Hut (Barnes) from SPLASH-2 [19]; Integer-Sort (IS) from the NAS parallel benchmarks [20]; Successive Over-Relaxation (SOR) and the Traveling Salesperson Problem (TSP) from the Treadmarks [21] benchmark applications; N-Body (NBody) and N-Body-Write (NBodyW) are computation kernels that imitate N-body applications [22]. See Table 3 for the input data sets used for each application.…”

Section: Applicationsmentioning

confidence: 99%

In-kernel integration of operating system and infiniband functions for high performance computing clusters: a DSM example

Liss

Birk

IEEE Trans. Parallel Distrib. Syst.

2005

The Infiniband (IB) System Area Network (SAN) enables applications to access hardware directly from user level, reducing the overhead of user-kernel crossings during data transfer. However, distributed applications that exhibit close coupling between network and OS services may benefit from accessing IB from the kernel through IB's native Verbs interface, which permits tight integration of these services. We assess this approach using a sequential-consistency Distributed Shared Memory (DSM) system as an example. We first develop primitives that abstract the low-level communication and kernel details, and efficiently serve the application's communication, memory, and scheduling needs. Next, we combine the primitives to form a kernel DSM protocol. The approach is evaluated using our full-fledged Linux kernel DSM implementation over Infiniband. We show that overheads are reduced substantially, and overall application performance is improved in terms of both absolute execution time and scalability relative to an entirely user level implementation.Index Terms-Hardware/software interfaces, high-speed networks, distributed shared memory, parallel computing.

Software Distributed Shared Memory: a VIA-based implementation and comparison of sequential consistency with home-based lazy release consistency

Iosevich

2005

Softw: Pract. Exper.

Self Cite

A Distributed Shared Memory (DSM) system provides a distributed application with a shared virtual address space. This article proposes a design for implementing the DSM communication layer on top of the Virtual Interface Architecture (VIA), an industry standard for user-level networking protocols on high-speed clusters. User-level communication protocols operate in user mode, thus removing the operating system kernel's overhead from the critical communication pass, and significantly diminishing communication overhead as a result. We analyze VIA's facilities and limitations in order to ascertain which implementation trade-offs can be best applied to our development of an efficient communication substrate optimized for DSM requirements. We then implement a multithreaded version of the Home-based Lazy Release Consistency (HLRC) protocol on top of this substrate. In addition, we compare the performance of this HLRC protocol with that of the Sequential Consistency (SC) protocol in which a MULTIVIEW (MV) memory mapping technique was used. This technique enables a fine-grained access to shared memory, while still relying on the virtual memory hardware to track memory accesses. We perform an 'apple-toapple' comparison on the same testbed environment and benchmark suite, and investigate the effectiveness and scalability of both protocols. ; 35:755-786 SOFTWARE DISTRIBUTED SHARED MEMORY 757 synchronization points are reached; that is, between these synchronization points, the shared memory may appear inconsistent to different processors. These alternate models guarantee, for properlylabeled [4] programs, results equivalent to those of a sequentially consistent system. Informally, a program is properly labeled if the program contains enough synchronization to avoid data races. Synchronization operations are divided into ACQUIRE and RELEASE operations, used respectively to obtain and yield exclusive access to shared data. These operations can be thought of as standard lock operations.Lazy Release Consistency (LRC) [5,6] is a refinement of the Release Consistency (RC) model [4]. The RC model requires that shared memory accesses be performed globally upon a RELEASE operation only. The idea of LRC is to make those accesses visible only to the processor that acquires a lock rather than perform all operations globally. A home-based implementation of LRC (HLRC) was proposed by Iftode [10]. In this implementation each shared page has an assigned home node that always hosts the most updated contents of the page. These updated contents may be fetched by a non-home node that needs an updated version. ContributionThis work compares the runtime performance of two multithreaded memory coherence protocols: a multithreaded implementation of the HLRC model and an efficient multithreaded implementation of the SC model that uses a MULTIVIEW [11,12] memory mapping technique. We also examine and compare the scalability of these protocols to a multithreaded mode of execution. Previous studies proposed non-preemptive multithreading [13] or creati...