We proposed an optimisation algorithm based on the sequence-to-sequence (Seq2Seq) stacking of the gate recurrent unit (GRU) model to characterise and approximate the forward problem of partial differential equations (PDEs). Unlike traditional methods based on mesh differential approximation and no parameters, this is a meshless approach based on parametric semi-supervised learning. Specifically, the algorithm employs the ability of deep feedback neural networks to approximate continuous dynamical systems, which is enhanced by stacked GRU modules to capture the evolution of the PDEs over time and thus enrich the representations of the sequences in the hidden space. The loss function of the model incorporates partial physical knowledge as an a priori condition to guide the optimisation direction, i.e., transforming the numerical iterative problem into a non-convex optimisation problem. In addition, each round of training of the model incorporates data resampling to prevent it from overfitting. We evaluated the ability of the proposed algorithm to solve mathematical physics equations for a variety of differential operators and constraints, including the heat, wave, Burgers, Schrodinger, diffusion, and Kovasnay flow equations. The experimental results confirmed the outstanding prediction precision and generalisation capability of the proposed algorithm.INDEX TERMS Forward problem of partial differential equations, Deep Learning, Gate recurrent unit, Sequence-to-sequence.