In recent years, 3D target detection and location methods in the field of autonomous driving have attracted increasing attention, but monocular 3D pedestrian localization research is still facing challenges. In this paper, a monocular pedestrian localization framework and a fine-grained location optimization method which is based on a bird's eye view are proposed. The monocular pedestrian localization framework is divided into three parts, which are coarse-grained localization, depth information reconstruction and fine-grained location optimization. In the stage of coarse-grained location, the human skeleton information is obtained from the original image by using the human skeleton point detection method, and then the pedestrian position is predicted through the method of a light-weight feed-forward neural network. In the stage of depth information reconstruction, the original image is used to reconstruct the corresponding bird's eye view with depth information through a parallel network. Finally, a fine-grained positioning optimization method makes it possible to get a more precise location with the help of the last two stages. The experimental results on the KITTI dataset show that our method has achieved better performance than the state-of-the-art methods.