2021 IEEE International Symposium on Information Theory (ISIT) 2021
DOI: 10.1109/isit45174.2021.9518254
|View full text |Cite
|
Sign up to set email alerts
|

Differentially Quantized Gradient Descent

Abstract: Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The only information that the server knows about the problem instance is what it receives from the worker via a rate-limited noiseless communication channel. We introduce the principle we call differential quantization (DQ) that prescribes that the past quantization errors… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…We note that results of a similar conceptual flavor are established in[29] and[23] in the context of stabilization of an LTI system, and optimization, respectively. While the results in these papers pertain to deterministic settings, Lemma 1 carefully exploits statistical concentration bounds specific to the stochastic process we study.…”
mentioning
confidence: 61%
See 1 more Smart Citation
“…We note that results of a similar conceptual flavor are established in[29] and[23] in the context of stabilization of an LTI system, and optimization, respectively. While the results in these papers pertain to deterministic settings, Lemma 1 carefully exploits statistical concentration bounds specific to the stochastic process we study.…”
mentioning
confidence: 61%
“…A common abstraction for analyzing optimization under limited communication is one where a worker agent transmits quantized gradients to a server over a finite bit-rate communication channel [21][22][23]. Inspired by this model, for our problem of interest, we introduce and study a new linear stochastic bandit formulation comprising of an agent connected to a decision-making entity (server) by a noiseless communication channel of finite capacity B; see Fig.…”
Section: Introductionmentioning
confidence: 99%
“…However, calling upon the theory of predictive quantization (e.g., the sigmadelta modulation adopted in PCM [44]), we see that the impact of quantization errors on convergence can be reduced by properly leveraging the inherent memory arising in recursive implementations such as gradient descent implementations. Two canonical paradigms to achieve this goal are error-feedback management [45], [46], [47], [48] and differential quantization [49], [50], which, perhaps surprisingly, have been applied to distributed optimization quite recently.…”
Section: B Compression For Distributed Optimizationmentioning
confidence: 99%
“…This is because: i) the innovation typically exhibits a reduced range as compared to the entire sample; and ii) owing to the correlation between consecutive samples, quantizing the entire sample will waste resources by transmitting redundant information. The information-theoretic fundamental limits of (non-stochastic) gradient descent under differential quantization have been recently established in [50].…”
Section: B Compression For Distributed Optimizationmentioning
confidence: 99%
See 1 more Smart Citation