Three-dimensional object detection has been substantially improved with the use of expensive LiDAR and stereo vision systems in intelligent driving. The less-expensive and more scalable solution of monocular 3D object detection, however, remains a key challenge. This study primarily explores real-time pseudo 3D object detection with monocular vision and designs a single-shot RPN model, VKP-P3D, which relies purely on visual feature extraction. Through a multiscale feature fusion and an attention mechanism module, this model obtains high-dimensional feature representations during the feature extraction phase. In the detection head of the VKP-P3D model, the pseudo 3D object detection is obtained by regressing 2D bounding box and the visible key points within the image coordinate of the 3D box from the camera's perspective. Finally, assuming flat ground and considering geometric parameters of the camera, the object's 3D information can be extracted. To verify the effectiveness of the proposed algorithm, we constructed two pseudo 3D object detection datasets based on visible key points and compared with current state-of-the-art real-time object detector. Results showed that the proposed model has high detection accuracy and speed.INDEX TERMS Monocular vision, pseudo 3D object detection, visual key point, single-shot RPN network.