Shear wave velocity plays an important role in both reservoir prediction and pre-stack inversion. However, the current deep learning-based shear wave velocity prediction methods have certain limitations, including lack of training dataset, poor model generalization, and poor physical interpretability. In this study, the theoretical rock physics models are introduced into the construction of the labeled dataset for deep learning algorithms, and a forward simulation of the theoretical rock physics models is utilized to supplement the dataset that incorporates geological and geophysical knowledge. This markedly increases the physical interpretability of the deep learning algorithm. Theoretical rock physics models for two different types of reservoirs, i.e., conventional sandstone and tight sandstone reservoirs, are first established. Then, a full-sample labeled dataset is constructed using these two types of theoretical rock physics models to traverse the elasticity parameter space of the two types of reservoirs through random variation and combination of parameters in the theoretical models. Finally, based on the constructed full-sample labeled dataset, four parameters (P-wave velocity, clay content, porosity, and density) that are highly correlated with the shear wave velocity are selected and combined with a deep neural network to build a deep shear wave velocity prediction network with good generalization and robustness, which can be directly applied to field data. The errors between the predicted shear wave velocity using the deep neural network and the measured shear wave velocity data in the laboratory and the logging data in three real field work areas are less than 5%, which are much smaller than the errors predicted by both Han’s and Castagna’s empirical formula. Furthermore, the prediction accuracy and generalization performance are better than those of these two common empirical formulas. The forward simulation based on theoretical models supplements the training dataset and provides high-quality labels for machine learning. This can considerably improve the interpretability and generalization of models in real applications of a machine learning algorithm.