2020
DOI: 10.48550/arxiv.2009.13725
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Robustness of the Normalized Subgradient Method with Randomly Corrupted Subgradients

Abstract: Numerous modern optimization and machine learning algorithms rely on subgradient information being trustworthy and hence, they may fail to converge when such information is corrupted. In this paper, we consider the setting where subgradient information may be arbitrarily corrupted (with a given probability) and study the robustness properties of the normalized subgradient method. Under the probabilistic corruption scenario, we prove that the normalized subgradient method, whose updates rely solely on direction… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…We allow arbitrary adversarial corruption in a centralized setup, which prevents robust aggregation to create gradient estimates. Closest to our setup is our previous work in [37], which studies robustness of normalized subgradient method in a randomly corrupted subgradient setting. However, [37] studies a full gradient type method for constrained convex optimization problems satisfying a certain acute angle condition, whereas this work considers a block coordinate descent type method for unconstrained non-convex optimization problems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We allow arbitrary adversarial corruption in a centralized setup, which prevents robust aggregation to create gradient estimates. Closest to our setup is our previous work in [37], which studies robustness of normalized subgradient method in a randomly corrupted subgradient setting. However, [37] studies a full gradient type method for constrained convex optimization problems satisfying a certain acute angle condition, whereas this work considers a block coordinate descent type method for unconstrained non-convex optimization problems.…”
Section: Introductionmentioning
confidence: 99%
“…Closest to our setup is our previous work in [37], which studies robustness of normalized subgradient method in a randomly corrupted subgradient setting. However, [37] studies a full gradient type method for constrained convex optimization problems satisfying a certain acute angle condition, whereas this work considers a block coordinate descent type method for unconstrained non-convex optimization problems. Paper Organization: The remainder of the paper is organized as follows.…”
Section: Introductionmentioning
confidence: 99%
“…Seeing the need for large batch sizes for variance reduction of stochastic gradients as a drawback of normalized updates, a recent work [33] proves that adding momentum removes the need for large batch sizes on non-convex objectives while matching the best-known convergence rates. In a preliminary conference report [34], we investigated the robustness properties of the normalized subgradient method for solving deterministic optimization problems in a centralized fashion. In the current work, we expand [34] into a distributed setup with a stochastic objective function, additionally study non-convex objectives both theoretically and numerically, and employ two additional layers of defense by means of robust mean estimation before applying normalization to improve our algorithm.…”
mentioning
confidence: 99%
“…In a preliminary conference report [34], we investigated the robustness properties of the normalized subgradient method for solving deterministic optimization problems in a centralized fashion. In the current work, we expand [34] into a distributed setup with a stochastic objective function, additionally study non-convex objectives both theoretically and numerically, and employ two additional layers of defense by means of robust mean estimation before applying normalization to improve our algorithm.…”
mentioning
confidence: 99%