With the popularity of deep learning (DL), more and more studies are focusing on replacing time-consuming numerical simulations with efficient surrogate models to predict the production of multi-stage fractured horizontal wells. Previous studies on constructing surrogate models for the prediction of the production of fractured horizontal wells often relied on directly applying existing deep learning architectures without incorporating physical constraints into the model. When dealing with the large number of variables necessary for characterizing the properties of fractures, the input variables of proxy models are often oversimplified; meanwhile, lots of physical information is lost. Consequently, predictions are sometimes physically inconsistent with the underlying principles of the domain. In this study, by modifying the traditional Seq2Seq (LSTM–LSTM) deep learning architecture, a physics-informed encoder–decoder (PIED) architecture was developed to surrogate the numerical simulation codes for predicting the production of horizontal wells with unequal-length intersecting hydraulic fractures on a 2D plane. The encoder is a LSTM network, and the decoder consists of LSTM and fully connected layers. The attention algorithm is also applied in the Seq2Seq architecture. The PIED model’s encoder is capable of extracting the physical information related to fractures. And the attention module effectively passes on the most relevant physical information related to production to the decoder during the training process. By modifying Seq2Seq architecture, the decoder of the PIED incorporates the intermediate input, which is the constant production time, along with the extracted physical information to predict production values. The PIED model excels in extracting sufficient physical information from high-dimensional inputs while ensuring the integrity of the production time information. By considering the physical constraints, the model predicts production values with improved accuracy and generalization capabilities. In addition, a multi-layer perceptron (MLP) which is broadly used as a proxy model; a regular Seq2Seq model (LSTM–Attention–LSTM); and the PIED were compared via a case study, and their MAE values were shown to be 241.76, 184.07, 168.81, respectively. Therefore, the proposed model has higher accuracy and better generalization ability. In the case study, a comparative experiment was conducted by comparing LSTM–MLP (with an MAE of 221.50) and LSTM–LSTM to demonstrate that using LSTM as the decoder structure is better for predicting production series. Moreover, in the task of predicting production sequences, LSTM outperforms MLP. The Seq2Seq architecture demonstrated excellent performance in this problem, and it achieved a 48.4% reduction in MSE compared to MLP. Meanwhile, the time cost for build datasets was considered, and the proposed model was found to be capable of training in a small dataset (e.g., in the case study, 3 days were used to generate 450 samples for training.); thus, the proposed model has a certain degree of practicality.