by using a physically-based, time-resolved renderer. This approach allows us to tackle a key problem in ToF, the lack of ground-truth data, by using a large-scale captured training set with MPI-corrupted depth to train the encoder, and a smaller synthetic training set with ground truth depth to train the decoder stage of the network. We demonstrate and validate our method on both synthetic and real complex scenarios, using an off-the-shelf ToF camera, and with only the captured, incorrect depth as input.