Precise and sturdy three-dimensional object detection (3DOD) presents a promising opportunity within the realm of mobile robot (MR) navigation. Monocular 3DOD techniques often involve extending existing 2D object detection (2DOD) frameworks to predict the 3D bounding box (3DBB) of objects captured in 2D RGB images. Nonetheless, these methods demand multiple images, making them less feasible for a variety of real-time scenarios. To ease these challenges, the rise of nimble convolutional neural networks (CNNs) capable of inferring depth from a sole image opens a fresh path for investigation. The current study introduces a nimble FDENet net-work designed to produce the cost-effective 3D Bounding Box Estimation (3D-BBE) from a single image. The novel framework comprises the PP-LCNet as the encoder and a fast convolution decode as the decoder. Moreover, this fusion integrates a Squeeze-Exploit (SE) module using the MKLDNN optimizer to boost convolutional efficiency and enhance model size streamlining with effective training. Meanwhile, the proposed multi-scale sub-pixel nimble decoder generates top-notch depth maps while upholding a nimble structure. Furthermore, the produced depth maps effectively offer a straightforward perspective with distance details of items within the surroundings. The depth insights are melded with 2DOD for precise evaluation of 3D Bounding Box (3DBB), facilitating scene comprehension and optimal route planning for mobile robots. Based on obtained 3DBB’s object center estimation, the optimal MR’s obstacle avoidance strategy is completely designed. Experimental outcomes showcase that our model attains cutting-edge performance levels across three datasets of NYU-V2, KITTI, and Cityscapes. Overall, this framework exhibits tremendous potential for adaptation into intelligent mechatronic systems, especially in crafting knowledge-driven systems for MR navigation.