Three-dimensional (3D) imaging captures depth information from a given scene and is used in a wide range of fields like industrial environments, smartphones and autonomous driving, among others. This paper summarises the results of a depth video super-resolution scheme that is tailored for single-photon avalanche diode (SPAD) image sensors, which produces 3D maps at frame rates > 100 FPS (32×64 pixels). Consecutive frames are used to super-resolve and denoise depth maps via 3D convolutional neural networks with an upscaling factor of 4. Due to the lack of noise-free, high-resolution depth maps captured with high-speed cameras, the neural network is trained with synthetic data using Unreal Engine, which is later processed to resemble the data outputted by a SPAD sensor. The model is then tested with different video sequences captured with a high-speed SPAD dToF, which processes frames at >30 frames per second. The super-resolved data shows a significant reduction in noise and presents enhanced edge details in objects. We believe these results are relevant to improve the accuracy of object detection in autonomous driving cars for collision avoidance or AR/VR systems.