Aditya Devarakonda scite author profile

Abstract-We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to 1.6TB particle physics, 2.2TB and 16TB climate modeling and 1.1TB bioimaging data. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's dataparallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.

show abstract

Cloud Paradigms and Practices for Computational and Data-Enabled Science and Engineering

Parashar

AbdelBaky

Rodero

et al. 2013

Comput. Sci. Eng.

View full text Add to dashboard Cite

Avoiding Communication in Primal and Dual Block Coordinate Descent Methods

Devarakonda¹,

Fountoulakis²,

Demmel³

et al. 2019

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Primal and dual block coordinate descent methods are iterative methods for solving regularized and unregularized optimization problems. Distributed-memory parallel implementations of these methods have become popular in analyzing large machine learning datasets. However, existing implementations communicate at every iteration which, on modern data center and supercomputing architectures, often dominates the cost of floating-point computation. Recent results on communication-avoiding Krylov subspace methods suggest that large speedups are possible by reorganizing iterative algorithms to avoid communication. We show how applying similar algorithmic transformations can lead to primal and dual block coordinate descent methods that only communicate every s iterations-where s is a tuning parameter-instead of every iteration for the regularized least-squares problem. We show that the communication-avoiding variants reduce the number of synchronizations by a factor of s on distributed-memory parallel machines without altering the convergence rate and attains strong scaling speedups of up to 6.1× on a Cray XC30 supercomputer.Key words. primal and dual methods, communication-avoiding algorithms, block coordinate descent, ridge regression AMS subject classifications. 15A06; 62J07; 65Y05; 68W10.1. Introduction. The running time of an algorithm depends on computation, the number of arithmetic operations (F ), and communication, the cost of data movement. The communication cost includes the "bandwidth cost", i.e. the number, W, of words sent either between levels of a memory hierarchy or between processors over a network, and the "latency cost", i.e. the number, L, of messages sent, where a message either consists of a group of contiguous words being sent, or is used for interprocess synchronization. On modern computer architectures, communicating data often takes much longer than performing a floating-point operation and this gap is continuing to increase. Therefore, it is especially important to design algorithms that minimize communication in order to attain high performance on modern computer architectures. Communication-avoiding algorithms are a new class of algorithms that exhibit large speedups on modern, distributed-memory parallel architectures through careful algorithmic transformations [5]. Much of direct and iterative linear algebra have been re-organized to avoid communication and has led to significant performance improvements over existing state-of-the-art libraries [5,4,9,29,45,52]. The results from communication-avoiding Krylov subspace methods [9,21,29] are particularly relevant to our work.The origins of communication-avoiding Krylov subspace methods lie in the sstep Krylov methods work. Van Rosendale's s-step conjugate gradients method [50], Chronopoulos and Gear's s-step methods for preconditioned and unpreconditioned symmetric linear systems [15,16], Chronopoulos and Swanson's s-step methods for unsymmetric linear systems [17] and Kim and Chronopoulos's s-step non-symmetric Lanczos method [31] were design...

show abstract

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Gittens¹,

Devarakonda²,

Racah³

et al. 2016

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.