Bulk Synchronous Parallelism (BSP) is a parallel programming model that abstracts from low-level program structures in favour of supersteps. A superstep consists of a set of independent local computations, followed by a global communication phase and a barrier synchronisation. Structuring programs in this way enables their costs to be accurately determined from a few simple architectural parameters, namely the permeability of the communication network to uniformly-random traffic and the time to synchronise. Although permutation routing and barrier synchronisations are widely regarded as inherently expensive, this is not the case. As a result, the structure imposed by BSP does not reduce performance, while bringing considerable benefits for application building. This paper answers the most common questions we are asked about BSP and justifies its claim to be a major step forward in parallel programming. Why Is Another Model Needed?In the 1980s, a large number of different types of parallel architectures were developed. This variety may have been necessary to thoroughly explore the design space but, in retrospect, it had a negative effect on the commercial development of parallel applications software. To achieve acceptable performance, software had to be carefully tailored to the specific architectural properties of each computer, making portability almost impossible. Each new generation of processors appeared in strikingly-different parallel architectural frameworks, forcing performancedriven software developers to redesign their applications from the ground up. Understandably, few were keen to join this process.Today, the number of parallel computation models and languages probably exceeds the number of different architectures with which parallel programmers had to contend ten years ago. Most make it hard to achieve portability, hard to achieve performance, or both. The two largest classes of models are based on message passing, and on shared memory. Those based on message passing are inadequate for three reasons. First, messages require paired actions at the sender and receiver, which it is difficult to ensure are correctly matched. Second, messages blend communication and synchronisation so that sender and receiver must be in appropriately-consistent states when the communication takes place. This is appallingly difficult to ensure in most models, and programs are prone to deadlock as a result. Third, the performance of such programs is impossible to predict because the interaction of large numbers of individual messages in the interconnection mechanism makes the variance in their delivery times large.The argument for shared-memory models is that they are easier to program because they provide the abstraction of a single, shared address space. A whole class of placement decisions are avoided. This is true, but is only half of the issue. When memory is shared, simultaneous access to the same location must be prevented. This requires either PRAM-style discipline by the programmer, or expensive lock mana...
No abstract
The model of bulk-synchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. Its modification, the BSPRAM model, allows one to combine the advantages of distributed and shared-memory style programming. In this paper we study the BSP memory complexity of matrix multiplication. We propose new memory-efficient BSP algorithms both for standard and for fast matrix multiplication. The BSPRAM model is used to simplify the description of the algorithms. The communication and synchronization complexity of our algorithms is slightly higher than that of known time-efficient BSP algorithms. The current time-efficient and new memory-efficient algorithms are connected by a continuous tradeoff. Introduction.The model of bulk-synchronous parallel (BSP) computation (see [16], [10], [11], and [13]) provides a simple and practical framework for general-purpose parallel computing. Its main goal is to support the creation of architecture-independent and scalable parallel software. The key features of BSP are the treatment of the communication medium as an abstract fully connected network, and explicit and independent costing of communication and synchronization.Originally BSP was defined as a distributed memory model with point-to-point communication between the processors. In [15] the BSPRAM model-a variant of BSP based on a mixture of shared and distributed memory-was proposed. Paper [15] also identified some properties of a BSPRAM algorithm that suffice for its optimal simulation in BSP. Algorithms possessing at least one of these properties-communication-obliviousness, high slackness, high granularity-are abundant in scientific and industrial computing.The efficiency of many parallel applications is constrained by a limited amount of available memory. In this paper we extend the BSPRAM model to account for memory efficiency of a computation. We present new BSPRAM algorithms for matrix multiplication, considering both the standard method with sequential time complexity (n 3 ), and fast Strassen-type methods with sequential time complexity (n ω ), ω < 3. The new algorithms achieve a better memory performance than the McColl-Valiant time-efficient algorithm from [11] and [13] for the standard matrix multiplication, or the time-efficient algorithm from [12] for fast matrix multiplication. Communication and synchronization
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.