Metalenses-flat lenses made with optical metasurfaces-promise to enable thinner, cheaper, and better imaging systems. Achieving a sufficient angular field of view (FOV) is crucial toward that goal and requires a tailored incident-angle-dependent response. Here, we show that there is an intrinsic trade-off between achieving a desired broad-angle response and reducing the thickness of the device. It originates from the Fourier transform duality between space and angle. One can write down the transmission matrix describing the desired angle-dependent response, convert it to the spatial basis where its degree of nonlocality can be quantified through a lateral spreading, and determine the minimal device thickness based on such a required lateral spreading. This approach is general. When applied to wide-FOV lenses, it predicts the minimal thickness as a function of the FOV, lens diameter, and numerical aperture. The bound is tight, as some inverse-designed multilayer metasurfaces can approach the minimal thickness we found. This work offers guidance for the design of nonlocal metasurfaces, proposes a new framework for establishing bounds, and reveals the relation between angular diversity and spatial footprint in multi-channel systems.