“…Such DNNs with random weights are substantially connected to gaussian process and kernel methods (Daniely, Frostig, & Singer, 2016;Lee et al, 2018;Matthews, Rowland, Hron, Turner, & Ghahramani, 2018;Jacot, Gabriel, & Hongler, 2018). Furthermore, the theory of the neural tangent kernel (NTK) explains that even trained parameters are close enough to the random initialization in sufficiently wide DNNs, and the performance of trained DNNs is determined by the NTK on the initialization (Jacot et al, 2018;Lee et al, 2019;Arora et al, 2019). Karakida, Akaho, and Amari (2019b) focused on the FIM corresponding to the mean square error (MSE) loss and proposed a framework to express certain eigenvalue statistics by using order parameters.…”