Velocity model inversion is one of the most important tasks in seismic exploration. Full waveform inversion (FWI) can obtain the highest resolution in traditional velocity inversion methods, but it heavily depends on initial models and is computationally expensive. In recent years, a large number of deep learning based velocity model inversion methods have been proposed. One critical component in those deep learning based methods is a large training set containing different velocity models. We propose a method to construct a realistic structural model for deep learning network. Our P-wave velocity model building method for creating dense-layer/fault/salt body models can automatically construct a large number of models without much human effort, which is very meaningful for deep learning networks. Moreover, to improve the inversion result on these realistic structural models, instead of only using the common-shot gather, we also propose to extract features from the common-receiver gather as well. Through a large number of realistic structural models, reasonable data acquisition methods, and appropriate network setups, a more generalized result can be obtained through our proposed inversion framework, which has been demonstrated to be effective on the independent testing data set. The results of dense-layer models, fault models, and salt body models are compared and analyzed, respectively, which demonstrates the reliability of the proposed method and also provides practical guidelines for choosing the optimal inversion strategies in realistic situations.