“…Monocular Depth Estimation: Monocular depth estimation has been recently shifted to improving neural network architectures and optimizing methods [59,82,85,35,10,48,65,87], integrating hierarchical features [85,57,65], leveraging camera motion between pairs of frames [97,51,41,72], taking advantage of planner guidance [57,47] and 3D geometric constraints [62,21,61,46]. More recently, audio [81,38,69] has been introduced to help for estimating depth.…”