“…1. In this field, 2D poses [2], [3], pedestrian bounding boxes [4], optical flow [5], scene context [6], vehicles speeds [7], trajectories [8], ego-motion of vehicles [7] are utilized in previous works. In the meantime, the deep learning models, such as I3D [5], LSTM/RNN-based temporal models [8], [9], as well as the transformers [10] are adopted in recent years.…”