This study presents an adaptation of the YOLOv4 deep learning algorithm for 3D object detection, addressing a critical challenge in autonomous vehicle (AV) systems: accurate real-time perception of the surrounding environment in three dimensions. Traditional 2D detection methods, while efficient, fall short in providing the depth and spatial information necessary for safe navigation. This research modifies the YOLOv4 architecture to predict 3D bounding boxes, object depth, and orientation. Key contributions include introducing a multi-task loss function that optimizes 2D and 3D predictions and integrating sensor fusion techniques that combine RGB camera data with LIDAR point clouds for improved depth estimation. The adapted model, tested on real-world datasets, demonstrates a significant increase in 3D detection accuracy, achieving a mean average precision (mAP) of 85%, intersection over union (IoU) of 78%, and near real-time performance at 93–97% for detecting vehicles and 75–91% for detecting people. This approach balances high detection accuracy and real-time processing, making it highly suitable for AV applications. This study advances the field by showing how an efficient 2D detector can be extended to meet the complex demands of 3D object detection in real-world driving scenarios without sacrificing computational efficiency.