Depth estimation is crucial for computer vision applications like autonomous driving. While traditional methods such as LiDAR and radar are expensive, making monocular depth estimation a more cost-efficient alternative. However, deriving accurate depth from a single image is challenging due to its under-constrained nature. Monocular cues like perspective, scaling, and occlusion aid human depth perception, which deep learning-based models leverage to map image features to depth values. This research addresses the complexities of monocular depth estimation in mixed traffic conditions commonly found on Indian roads, with diverse vehicle classes, road surfaces, and unpredictable obstacles. Traditional methods often struggle in these scenarios. To overcome this, our study integrates object detection with deep learning models to estimate vehicle distances from frontal camera views. Validated using dashcam and drone footage, the proposed approach achieves an RMSE below 4 meters for both training and testing datasets. Moreover, the ensemble models reduced RMSE by up to 60% and improved the \(\textnormal{R}^\textnormal{2}\) value by 40%. This solution significantly enhances the spatial awareness of autonomous vehicles, providing a robust means of navigating heterogeneous traffic environments.