“…We already mentioned above the Burer-Monteiro approach to semidefinite programming [9,14], low-rank optimization [22,32,36], computer vision [17], and neural networks [38]. In [47,29], the authors compare the stationary points for (Q) to those of (P) for the case of linear neural networks and prove a special case of our Theorem 2.10 characterizing "1 ⇒ 1". The training of general neural networks, and risk minimization more generally, is naturally given in the form (Q), see [47,Appx.…”