With the advancement of Convolutional Neural Networks (CNNs), numerous convolutional neural network (CNN)-based methods have been proposed for depth estimation and have achieved significant achievements. However, the repetitive convolutional layers and spatial pooling layers in these networks often lead to a reduction in spatial resolution and loss of local information, such as edge contours. To address this issue, this study presents a multi-scale monocular depthestimation model. Specifically, a Global Understanding Module (GUM) was introduced on top of a generic encoder to increase the receptive field and capture contextual information. Additionally, the decoding process incorporates a Difference Module and a Multi-scale Cascade Module to guide the decoding information and refine edge contour details. Finally, extensive experiments were conducted using the KITTI and NYUv2 datasets. For the KITTI dataset, the Absolute Relative Error (Abs.Rel) was 0.057, and the Root Mean Squared Error (RMSE) was 2.415. On the NYUv2 dataset, Abs.Rel was 0.104, and RMSE was 0.380. These results indicate that the model performs well in accurately estimating depth information.