“…For SGD with constant learning rate, there has been recent progress on quantifying the dimension dependence of the sample complexity for various tasks on general (pseudo or quasi‐) convex objectives [14, 15, 24, 33, 53, 68] and special classes of non‐convex objectives [6, 31, 71]. There has also been important work on scaling limits as the dimension tends to infinity for the specific problems of linear regression [55, 76], Online PCA [41, 76], and phase retrieval [71] from random starts, and teacher‐student networks [32, 64, 65, 73] and two‐layer networks for XOR Gaussian mixtures [60] from warm starts. We also note that the study of high‐dimensional regimes of gradient descent and Langevin dynamics have a history from the statistical physics perspective, for example, in [17, 21, 22, 45, 48, 67].…”