Recently, recurrent neural networks, like Long Short-Term Memory (LSTM) networks, have been widely used for sequence prediction within complex scenes [3]-[5], and the unique memory units in the network which help memorize historical continuous features have a rather effective performance in the mitigating error propagation. But they have a weak performance on social interaction modeling. Based on the strength of RNNs, an alternative kind of method, i.e. Generative Adversarial Networks (GANs) based methods, is proposed to model the uncertainty of pedestrian trajectories caused by potential factors in real scenes [6], [7]. However, GANs-based methods mostly address each trajectory separately on interaction modeling, which ignores some key factors within social interactions and has a high computational cost. Currently, graph and graph neural network-based methods [8]-[10] are widely used, because graph structures are more intuitive and understandable in modeling physical and social interactions among pedestrians. However, most of them still suffer from limitations when coping with social interactions, as some methods default to the fixed connection among all pedestrians while some others sample a fixed number of neighbors to build the graph. Thus, the approaches either introduce much noise or loss some implicit information, like time, when modeling social interactions, while the problem with error accumulation does not get sufficient consideration in these approaches, which results in the suboptimal predictions.