Elaine Jacobson scite author profile

SUMMARYMatrix multiplication is a key primitive in block matrix algorithms such as those found in LAPACK. We present results from our study of matrix multiplication algorithms on the Intel Touchstone Delta, a distributed memory message-passing architecture with a two-dimensional mesh topology. W e analyze and compare three algorithms and obtain an implementation, BiMMeR, that uses communication primitives highly suited to the Delta and exploits the single node assembly-coded matrix multiplication. Our algorithm is completely general, i.e. able to deal with various data layouts as well as arbitrary mesh aspect ratios and matrix dimensions, and has achieved parallel efficiency of 86%, with overall peak performance in excess of 8 Gflops on 256 nodes for an 8800 x 8800 matrix. We describe BiMMeR's design and implementation and present performance results that demonstrate scalability and robust behavior over varying mesh topologies.

show abstract

Comparison of scalable parallel matrix multiplication libraries

Huss–Lederman

Jacobson

Tsao

View full text Add to dashboard Cite

This paper compares two general library routines for performing parallel distributed matriz multiplication.The PUMMA algorithm utilizes block scattered data layout, whereas BiMM eR utilizes virtual 2 -0 torus wrap. The algorithmic differences resulting from these different layouts are discussed as well as the general issues associated with different data layouts for library routines. Results on the Intel Delta for the two matrix multiplication algorithms are presented.

show abstract

The impact of HPF data layout on the design of efficient and maintainable parallel linear algebra libraries

Bischof¹,

Huss–Lederman²,

Jacobson³

et al. 1994

View full text Add to dashboard Cite

In this document, we are concerned with the effccts of data layouts for nonsquare processor meshes on the implementation of common dense linear algebra kernels such as matrix-matrix multiplication, LU factorizations, or eigenvalue solvers. In particular, we address ease of programming and tunability of the resulting software. We introduce a generalization of the torus wrap data layout that results in a decoupling of "local" and "global" data layout view. As a result, it allows for intuitive programming of linear algebra algorithms and for tuning of the algorithm for a particular mesh aspect ratio or machine characteristics. This layout is as simple as the proposed HPF layout but, in our opinion, enhances ease of programming as well as ease of performance tuning. We emphasize that we do not advocate that all users need be concerned with these issues. We do, however, believe, that for the foreseeable future "assembler coding" (as message-passing code is likely to be viewed from a HPF programmers' perspective) will be needed to deliver high performance for computationally intensive kernels. As a result, we believe that the adoption of this approach not only would accelerate the generation of efficient linear algebra software libraries but also would accelerate the adoption of HPF as a result. We point out, however, that the adoption of this new layout would necessitate that an HPF compiler ensure that data objects are operated on in a consistent fashion across subroutine and function calls.

show abstract

Universal multistage networks via linear permutations

Fiduccia

Jacobson

1991

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Elaine Jacobson

Implementation of Strassen's algorithm for matrix multiplication

Matrix multiplication on the Intel Touchstone Delta

Comparison of scalable parallel matrix multiplication libraries

The impact of HPF data layout on the design of efficient and maintainable parallel linear algebra libraries

Universal multistage networks via linear permutations

Contact Info

Product

Resources

About