2012 IEEE 51st IEEE Conference on Decision and Control (CDC) 2012
DOI: 10.1109/cdc.2012.6426691
|View full text |Cite
|
Sign up to set email alerts
|

Communication-efficient algorithms for statistical optimization

Abstract: We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error (MSE) that decays as O(N … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
382
1

Year Published

2013
2013
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 273 publications
(388 citation statements)
references
References 18 publications
5
382
1
Order By: Relevance
“…At the other extreme, there are distributed methods using only a single round of communication, such as [24,36,38,80,81]. These methods require additional assumptions on the partitioning of the data, which are usually not satisfied in practice if the data are distributed "as is", i.e.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…At the other extreme, there are distributed methods using only a single round of communication, such as [24,36,38,80,81]. These methods require additional assumptions on the partitioning of the data, which are usually not satisfied in practice if the data are distributed "as is", i.e.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…When ERM is used and F (w) is λ-strongly convex, and f (w, z) is L-Lipschitz, H-smooth and has a J-Lipschitz Hessian, [29] obtain a guarantee onw of the following form (in expectation over the samples):…”
Section: Average-at-the-endmentioning
confidence: 99%
“…Optimizing over λ, the best that can be ensured from (13) for learning problems requiring regularization is therefor only a sample complexity that scales as 1/ 3 rather then 1/ 2 . If ERM is used on each machine [29] also suggested a bias-corrected approach that reduced the dependence on n in the second term to 1/n 3 , rather then 1/n 2 , but the problematic dependence on λ remains. These deficiencies are not only in the analysis.…”
Section: Average-at-the-endmentioning
confidence: 99%
“…An average mixture (AVGM) procedure for fitting the parameter of a parametric model has been studied by [10]. AVGM partitions the full available dataset into disjoint subsets, estimates the parameter within each subset, and finally combines the estimates by simple averaging.…”
Section: Introductionmentioning
confidence: 99%