As virtual reality technology advances, 3D environment design and modeling have garnered increasing attention. Applications in networked virtual environments span urban planning, industrial design, and manufacturing, among other fields. However, existing 3D modeling methods exhibit high reconstruction error precision, limiting their practicality in many domains, particularly environmental design. To enhance 3D reconstruction accuracy, this study proposes a digital image processing technology that combines binocular camera calibration, stereo correction, and a convolutional neural network (CNN) algorithm for optimization and improvement. By employing the refined stereo-matching algorithm, a 3D reconstruction model was developed to augment 3D environment design and reconstruction accuracy while optimizing the 3D reconstruction effect. An experiment using the ShapeNet dataset demonstrated that the evaluation indices—Chamfer distance (CD), Earth mover’s distance (EMD), and intersection over union—of the model constructed in this study outperformed those of alternative methods. After incorporating the CNN module in the ablation experiment, CD and EMD increased by an average of 0.1 and 0.06, respectively. This validates that the proposed CNN module effectively enhances point cloud reconstruction accuracy. Upon adding the CNN module, the CD index and EMD index in the dataset increased by an average of 0.34 and 0.54, respectively. These results indicate that the proposed CNN module exhibits strong predictive capabilities for point cloud coordinates. Furthermore, the model demonstrates good generalization performance.