Learning-based compressed sensing algorithms are popularly used for recovering the underlying datacube of snapshot compressive temporal imaging (SCTI), which is a novel technique for recording temporal data in a single exposure. Despite providing fast processing and high reconstruction performance, most deep-learning approaches are merely considered a substitute for analytical-modeling-based reconstruction methods. In addition, these methods often presume the ideal behaviors of optical instruments neglecting any deviation in the encoding and shearing processes. Consequently, these approaches provide little feedback to evaluate SCTI’s hardware performance, which limits the quality and robustness of reconstruction. To overcome these limitations, we develop a new end-to-end convolutional neural network—termed the deep high-dimensional adaptive net (D-HAN)—that provides multi-faceted process-aware supervision to an SCTI system. The D-HAN includes three joint stages: four dense layers for shearing estimation, a set of parallel layers emulating the closed-form solution of SCTI’s inverse problem, and a U-net structure that works as a filtering step. In system design, the D-HAN optimizes the coded aperture and establishes SCTI’s sensing geometry. In image reconstruction, D-HAN senses the shearing operation and retrieves a three-dimensional scene. D-HAN-supervised SCTI is experimentally validated using compressed optical-streaking ultrahigh-speed photography to image the animation of a rotating spinner at an imaging speed of 20 thousand frames per second. The D-HAN is expected to improve the reliability and stability of a variety of snapshot compressive imaging systems.