Primal and dual block coordinate descent methods are iterative methods for solving regularized and unregularized optimization problems. Distributed-memory parallel implementations of these methods have become popular in analyzing large machine learning datasets. However, existing implementations communicate at every iteration which, on modern data center and supercomputing architectures, often dominates the cost of floating-point computation. Recent results on communication-avoiding Krylov subspace methods suggest that large speedups are possible by reorganizing iterative algorithms to avoid communication. We show how applying similar algorithmic transformations can lead to primal and dual block coordinate descent methods that only communicate every s iterations-where s is a tuning parameter-instead of every iteration for the regularized least-squares problem. We show that the communication-avoiding variants reduce the number of synchronizations by a factor of s on distributed-memory parallel machines without altering the convergence rate and attains strong scaling speedups of up to 6.1× on a Cray XC30 supercomputer.Key words. primal and dual methods, communication-avoiding algorithms, block coordinate descent, ridge regression AMS subject classifications. 15A06; 62J07; 65Y05; 68W10.1. Introduction. The running time of an algorithm depends on computation, the number of arithmetic operations (F ), and communication, the cost of data movement. The communication cost includes the "bandwidth cost", i.e. the number, W, of words sent either between levels of a memory hierarchy or between processors over a network, and the "latency cost", i.e. the number, L, of messages sent, where a message either consists of a group of contiguous words being sent, or is used for interprocess synchronization. On modern computer architectures, communicating data often takes much longer than performing a floating-point operation and this gap is continuing to increase. Therefore, it is especially important to design algorithms that minimize communication in order to attain high performance on modern computer architectures. Communication-avoiding algorithms are a new class of algorithms that exhibit large speedups on modern, distributed-memory parallel architectures through careful algorithmic transformations [5]. Much of direct and iterative linear algebra have been re-organized to avoid communication and has led to significant performance improvements over existing state-of-the-art libraries [5,4,9,29,45,52]. The results from communication-avoiding Krylov subspace methods [9,21,29] are particularly relevant to our work.The origins of communication-avoiding Krylov subspace methods lie in the sstep Krylov methods work. Van Rosendale's s-step conjugate gradients method [50], Chronopoulos and Gear's s-step methods for preconditioned and unpreconditioned symmetric linear systems [15,16], Chronopoulos and Swanson's s-step methods for unsymmetric linear systems [17] and Kim and Chronopoulos's s-step non-symmetric Lanczos method [31] were design...