“…Another line of work on memory capacity has studied regression problems and showed that shallow networks with O(n) parameters can fit arbitrary n input/output pairs [4,11,23,26], while o(n) parameters are sufficient for deep ones, i.e., ω(1) layers [17,22]. However, since these results assume exact mathematical operations, they do not apply to neural networks executed on computers that can only represent a tiny subset of the reals (e.g., floating-point numbers) and perform inexact operations (e.g., floating-point operations) [24,20].…”