Radio environment maps (REMs) have been established as an important tool in spectrum occupancy characterization toward more efficient coverage planning and design of resource allocation algorithms. The utilization of deep learning (DL) techniques for REM reconstruction, particularly when working with a limited number of samples, has garnered significant research attention owing to its speed and accuracy. This is particularly relevant for spatial three-dimensional REMs, which involve an exponential increase in the number of samples compared to the two-dimensional case. This paper presents a method for determining the optimal sampling grid resolution based on two key criteria, 1) the generated map's similarity to the covariance matrix mean (CMM) of measurements collected in real world (RW) scenarios, and 2) reduced computational complexity. Subsequently, three prominent DL models for REM reconstruction, are evaluated, with the convolutional autoencoder (CAE) achieving the best performance. To enhance its accuracy, a neural network (NN) design approach is introduced, which involves assessing the difference in CMM between the original example and the output for each layer in the NN architecture. This method identifies the layers that introduce the smallest difference and determines their optimal number of filters. Thus, the model's complexity is reduced, and the accuracy is improved. Normalized root mean square error of as little as −35 dB is achieved for a sampling rate of 30%.