“…SGD with biased noise. Many algorithms can be viewed as SGD with structured but potentially biased noise, including SGD with (biased) compression (Stich et al, 2018;Gorbunov et al, 2020), delayed SGD (Mania et al, 2017;Dutta et al, 2018), local SGD (Stich, 2019), federated learning methods (Karimireddy et al, 2020;Yuan & Ma, 2020;Mitra et al, 2021;Nguyen et al, 2022), decentralized optimization methods (Yu et al, 2019;Koloskova et al, 2020), and many others. Convergence analyses for such methods often use techniques like perturbed iterate analysis (Mania et al, 2017).…”