The modeling and control issues for distributed parameter systems (DPSs) have received a great deal of attention. Because linear model order reduction (MOR) methods may ignore the nonlinear dynamics and lose some details, it is difficult to describe DPS accurately by common modeling methods. To effectively model such systems, a sparse stacked auto-encoder and gated recurrent unit (SSAE-GRU) model is proposed in this paper. Under the time/space separation theory, it is the mainstream way to perform MOR and identification of time series respectively. In the SSAE-GRU model, this practice is still adhered to but joint learning is recommended. SSAE can be used as an excellent MOR technique. A sparse activation strategy that is introduced makes its model space simple and easy to train. GRU has the ability to represent such complex temporal properties because the information stored by previous neurons can be transmitted to the current moment selectively. The joint training method allows them to be responsible and consider the connection between adjacent moments and spatial energy transfer overall. Then, we use L2 regularization in back-propagation to reduce the difficulty of model optimization and prevent overfitting. The modeling scheme is simulated on two typical chemical thermal processes. This article demonstrates the effectiveness of the proposed method as well as outstanding performance compared to existing methods.INDEX TERMS Distributed parameter systems, model order reduction, sparse stacked auto-encoder, gated recurrent unit, joint learning