“…(i) deriving prediction rates of the empirical risk minimizers (1) or ( 2); (ii) finding an optimization algorithm that identifies the corresponding empirical risk minimizers. Convergence rates of empirical risk minimizers (ERM) over the classes of deep ReLU networks are studied in [4], [13], [15] and [18]. In [4] it is shown that the ERM of the form (1), with W n being the set of weight vectors with coordinates {0, ±1/2, ±1, 2}, attains, up to logarithmic factors, the minimax rates of prediction of β-smooth functions.…”