3D time-of-flight (ToF) image sensors are used widely in applications such as self-driving cars, augmented reality (AR), and robotics. When implemented with single-photon avalanche diodes (SPADs), compact, array format sensors can be made that offer accurate depth maps over long distances, without the need for mechanical scanning. However, array sizes tend to be small, leading to low lateral resolution, which combined with low signal-to-background ratio (SBR) levels under high ambient illumination, may lead to difficulties in scene interpretation. In this paper, we use synthetic depth sequences to train a 3D convolutional neural network (CNN) for denoising and upscaling (×4) depth data. Experimental results, based on synthetic as well as real ToF data, are used to demonstrate the effectiveness of the scheme. With GPU acceleration, frames are processed at >30 frames per second, making the approach suitable for low-latency imaging, as required for obstacle avoidance.