To help the new media presentation and brand building of Dongguan City memory, a virtual reality modeling model based on multi‐perspective and deep learning is proposed. First, in order to address the issue of imbalanced input and output information in depth map prediction tasks, as well as poor accuracy of predicted depth map boundaries, a depth map prediction model based on multiple perspectives and deep learning is built. Then, a single perspective modeling framework is proposed to address the scarcity of perspectives in practical situations, and a dynamic fusion model is built for single‐view virtual reality scenes based on multi‐view generation networks. The results indicated that the mean square error, average relative error, and average logarithmic error were minimized at 0.52, 0.138, and 0.068, respectively. The multi‐threshold accuracy index demonstrated peak values at 0.772, 0.8821, and 0.947, respectively. The grid simplification algorithm exhibited the shortest running times at 1.57, 2.52, 3.91, and 6.53 s, respectively. Moreover, the single‐view modeling frame displayed the smallest angle distance at 0.1449, while the overlap degree of the point cloud scene reached the highest levels at 77.47, 79.49, 83.5, and 84.47, respectively. To sum up, the model has a good application effect in virtual reality modeling and positively affects virtual reality technology development.