<p>An accurate and robust perception system is key to understanding the driving environment of autonomous driving and robots. Autonomous driving needs 3D information about objects, including the object’s location and pose, to understand the driving environment clearly. A camera sensor is widely used in autonomous driving because of its richness in color, texture, and low price. The major problem with the camera is the lack of 3D information, which is necessary to understand the 3D driving environment. Additionally, the object’s scale change and cclusion make 3D object detection more challenging. Many deep learning-based methods, such as depth estimation, have been developed to solve the lack of 3D information. This survey presents the image 3D object detection 3D bounding box encoding techniques, feature extraction techniques, and evaluation metrics of 3D object detection. The image-based methods are categorized based on the technique used to estimate an image’s depth information, and insights are added to each method. Then, state-of-the-art (SOTA) monocular and stereo camera-based methods are summarized. We also compare the performance of the selected 3D object detection models and present challenges and future directions in 3D object detection.</p>