Autonomous vehicle systems rely heavily upon depth estimation, which facilitates the improvement of precision and stability in automated decision-making systems. Noteworthily, the technique of monocular depth estimation is critical for one of these feasible implementations. In the area of segmentation of medical images, UNet is a well-known encoder–decoder structure. Moreover, several studies have proven its further potential for monocular depth estimation. Similarly, based on UNet, we aim to propose a novel model of monocular depth estimation, which is constructed from the benefits of classical UNet and residual learning mechanisms and named URNet. Particularly, we employ the KITTI dataset in conjunction with the Eigen split strategy to determine the efficacy of our model. Compared with other studies, our URNet is significantly better, on the basis of higher the precision and lower error rate. Hence, it can deal properly with the depth estimation issue for autonomous driving systems.