The model of bulk-synchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. Its modification, the BSPRAM model, allows one to combine the advantages of distributed and shared-memory style programming. In this paper we study the BSP memory complexity of matrix multiplication. We propose new memory-efficient BSP algorithms both for standard and for fast matrix multiplication. The BSPRAM model is used to simplify the description of the algorithms. The communication and synchronization complexity of our algorithms is slightly higher than that of known time-efficient BSP algorithms. The current time-efficient and new memory-efficient algorithms are connected by a continuous tradeoff.
Introduction.The model of bulk-synchronous parallel (BSP) computation (see [16], [10], [11], and [13]) provides a simple and practical framework for general-purpose parallel computing. Its main goal is to support the creation of architecture-independent and scalable parallel software. The key features of BSP are the treatment of the communication medium as an abstract fully connected network, and explicit and independent costing of communication and synchronization.Originally BSP was defined as a distributed memory model with point-to-point communication between the processors. In [15] the BSPRAM model-a variant of BSP based on a mixture of shared and distributed memory-was proposed. Paper [15] also identified some properties of a BSPRAM algorithm that suffice for its optimal simulation in BSP. Algorithms possessing at least one of these properties-communication-obliviousness, high slackness, high granularity-are abundant in scientific and industrial computing.The efficiency of many parallel applications is constrained by a limited amount of available memory. In this paper we extend the BSPRAM model to account for memory efficiency of a computation. We present new BSPRAM algorithms for matrix multiplication, considering both the standard method with sequential time complexity (n 3 ), and fast Strassen-type methods with sequential time complexity (n ω ), ω < 3. The new algorithms achieve a better memory performance than the McColl-Valiant time-efficient algorithm from [11] and [13] for the standard matrix multiplication, or the time-efficient algorithm from [12] for fast matrix multiplication. Communication and synchronization