“…In the past many years, the method of depth detection with lidar has been studied extensively [ 12 , 13 , 14 , 15 , 16 ] while estimating depth information from a single image taken by a monocular camera is attracting more research interest [ 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ]. Monocular depth estimation is essentially vague and a technically ill-posed problem: With only an image, there will be an infinite number of possible world scenes where the image comes from.…”