We present a dense and metric 3D mapping pipeline designed for embedded operation on-board UAVs, by loosely coupling deep neural networks trained to infer dense depth single images with a SLAM system that restores metric scale from sparse depth. In contrast to computationally restrictive approaches that leverage multiple views, we propose a highly efficient, single-view approach without sacrificing 3D mapping performance. This enables real-time construction of a global 3D voxel map by iterative fusion of the rescaled dense depth maps obtained via raycasting from the estimated camera poses. Quantitative and qualitative experimentations of our framework in challenging environmental conditions show comparable or superior performance with respect to state-of-the-art approaches via a better effectiveness-efficiency trade-off.