Jack Poulson scite author profile

A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ 2 N 4/3 ) and O(γN log N ), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique.

show abstract

Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer

Marker

Poulson

Batory

et al. 2013

View full text Add to dashboard Cite

A Butterfly Algorithm for Synthetic Aperture Radar Imaging

Demanet¹,

Ferrara²,

Maxwell³

et al. 2012

SIAM J. Imaging Sci.

View full text Add to dashboard Cite

A Parallel Butterfly Algorithm

Poulson¹,

Demanet²,

Maxwell³

et al. 2014

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Abstract. The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform R d K(x, y)g(y)dy at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(N d ) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r 2 N d log N ). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most O r 2 N , y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms and an analogue of a 3D generalized Radon transform were respectively observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.

show abstract

Parallel Matrix Multiplication: A Systematic Journey

Schätz¹,

Geijn²,

Poulson³

2016

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

We expose a systematic approach for developing distributed-memory parallel matrixmatrix multiplication algorithms. The journey starts with a description of how matrices are distributed to meshes of nodes (e.g., MPI processes), relates these distributions to scalable parallel implementation of matrix-vector multiplication and rank-1 update, continues on to reveal a family of matrix-matrix multiplication algorithms that view the nodes as a two-dimensional (2D) mesh, and finishes with extending these 2D algorithms to so-called three-dimensional (3D) algorithms that view the nodes as a 3D mesh. A cost analysis shows that the 3D algorithms can attain the (order of magnitude) lower bound for the cost of communication. The paper introduces a taxonomy for the resulting family of algorithms and explains how all algorithms have merit depending on parameters such as the sizes of the matrices and architecture parameters. The techniques described in this paper are at the heart of the Elemental distributed-memory linear algebra library. Performance results from implementation within and with this library are given on a representative distributed-memory architecture, the IBM Blue Gene/P supercomputer.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jack Poulson

A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations

Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer

A Butterfly Algorithm for Synthetic Aperture Radar Imaging

A Parallel Butterfly Algorithm

Parallel Matrix Multiplication: A Systematic Journey

Contact Info

Product

Resources

About