“…Recently, Transformer structure [63] is applied in this task [21,62,76,77] to model the spatio-temporal relations via an attention mechanism. Moreover, various viewpoints have emerged towards more practical applications, i.e., goal-driven idea [13,40,60,81], long-tail situation [39], interpretability [32], robustness [9,66,70,80], counterfactual analysis [11], planningdriven [12], generalization ability to new environment [6,27,72], and knowledge distillation [44].…”