Thin and flat diffractive optical elements (DOEs) are significant in the field of integrated optics and provide a novel and optimal solution for hyperspectral imaging (HI) which is expected to be compact, snapshot, with large depth of field (DoF) and resolution. The tradeoff between spectral and spatial resolutions caused by the restricted DoF, limits the application scenarios for HI. To address this, based on the prior of spatial and spectral sparse, we propose a spatial–spectral achromatic (SSA) neural network to end-to-end optimize a broad-bandwidth system with a DOE to provide the support for snapshotly achromatic extreme-DoF HI. We experimentally show that our system can snapshotly capture achromatic, high-fidelity hyperspectral images with 25 spectral channels ranging from 420 nm to 660 nm, covering distances from 0.5 m to 5 m. The proposed system enables precise and dynamic reconstruction of spectra within an extreme DoF, a capability previously unattainable with compact computational spectral cameras. The precise reconstruction of spectra demonstrates the potential of the developed system in various applications, such as precision agriculture, food quality inspection, and object detection.