Finite-difference methods are the most widely used methods for seismic wavefield simulation. However, numerical dispersion is the main issue hindering accurate simulation. In the case where the finite-difference scheme is known, the time dispersion can be predicted mathematically and, thus, can be eliminated. However, when only pre-compiled software is available for wavefield simulation, which is common in practical applications, the software-used algorithm becomes a black box (unknown). Therefore, it is challenging to obtain the mathematical expression of the time dispersion, resulting in difficulty in eliminating the time dispersion. To solve this problem, we propose to use deep learning methods to eliminate time dispersion. We design a semi-supervised framework based on convolutional and recurrent neural networks for eliminating time dispersion caused by seismic wave modeling. The framework of our proposed neural network includes two main modules: Inverse Model and Forward Model, both of which have learnable parameters. The Inverse Model is used for eliminating time dispersion while the Forward Model is used for regularizing the training. Particularly, this framework includes two steps: Firstly, using the compiled modeling software to generate two data sets with large and small time steps. Secondly, we train these two modules for transformation between large time-step data (with time dispersion) and small time-step data (without time dispersion) by labeled and unlabeled data sets. In this work, the labeled data set is a paired data set with large time-step data and their corresponding small time-step data; the unlabeled data set is the large time-step data that need time-dispersion elimination. We use the unlabeled data set to guide the network. In this learning framework, re-training is required whenever the modeling algorithms, time interval, or frequency band is changed. Hence, we propose a transfer learning training method to extend from the trained model to another model, which reduces the computational cost caused by re-training. This minor drawback is offset overwhelmingly by the modeling efficiency gain with large time steps in large-scale production. Tests on two models confirm the effectiveness of the proposed method.