The 3D object reconstruction from depth image streams using Kinect-style depth cameras has been extensively studied. In this paper, we propose an approach for accurate camera tracking and volumetric dense surface reconstruction, assuming that a known cuboid reference object is present in the scene. Our contribution is threefold. First, we maintain the drift-free camera pose tracking by incorporating the 3D geometric constraints of the cuboid reference object into the image registration process. Second, we reformulate the problem of depth stream fusion as a binary classification problem, enabling highfidelity surface reconstruction, especially in the concave zones of objects. Third, we further present a surface denoising strategy to mitigate the topological inconsistency (e.g., holes and dangling triangles), which facilitates the generation of a noise-free triangle mesh. We extend our public dataset CU3D with several new image sequences, test our algorithm on these sequences, and quantitatively compare them with other state-of-the-art algorithms. Both our dataset and our algorithm are available as open-source content at https://github.com/zhangxaochen/CuFusion for other researchers to reproduce and verify our results. INDEX TERMS 3D object reconstruction, depth cameras, Kinect sensors, open source, signal denoising, SLAM. CHEN ZHANG received the B.S. degree in computer science from Zhejiang University, China, in 2013, and the Ph.D. degree from the College of Computer Science and Technology, Zhejiang University, China. He is currently with the State Key Laboratory of CAD and CG, Computer Animation and Perception Group, Zhejiang University. His primary research interests include simultaneous localization and mapping (SLAM), and 3D reconstruction.