The digital elevation model (DEM) acquired through photogrammetry or LiDAR usually exposes voids due to phenomena such as instrumentation artifact, ground occlusion, etc. For this reason, this paper proposes a multiattention generative adversarial network model to fill the voids. In this model, a multiscale feature fusion generation network is proposed to initially fill the voids, and then a multiattention filling network is proposed to recover the detailed features of the terrain surrounding the void area, and the channel-spatial cropping attention mechanism module is proposed as an enhancement of the network. Spectral normalization is added to each convolution layer in the discriminator network. Finally, the training of the model by a combined loss function, including reconstruction loss and adversarial loss, is optimized. Three groups of experiments with four different types of terrains, hillsides, valleys, ridges and hills, are conducted for validation of the proposed model. The experimental results show that (1) the structural similarity surrounding terrestrial voids in the three types of terrains (i.e., hillside, valley, and ridge) can reach 80–90%, which implies that the DEM accuracy can be improved by at least 10% relative to the traditional interpolation methods (i.e., Kriging, IDW, and Spline), and can reach 57.4%, while other deep learning models (i.e., CE, GL and CR) only reach 43.2%, 17.1% and 11.4% in the hilly areas, respectively. Therefore, it can be concluded that the structural similarity surrounding the terrestrial voids filled using the model proposed in this paper can reach 60–90% upon the types of terrain, such as hillside, valley, ridge, and hill.