“…Better prediction results are achievable using a fusion of visual features and coordinate information (Varshneya and Srinivasaraghavan 2017;Xue, Huynh, and Reynolds 2018;Manh and Alaghband 2018;Sadeghian et al 2019;Liang et al 2019;Kosaraju et al 2019;Sun, Zhao, and He 2020;Dendorfer, Elflein, and Leal-Taixé 2021;Zhao et al 2019;Tao, Jiang, and Duan 2020;Sun, Jiang, and Lu 2020;Shafiee, Padir, and Elhamifar 2021;Chai et al 2019). Recently, the use of Gaussian distribution (Hug, Hübner, and Arens 2020;Hug et al 2022;Xu, Yang, and Du 2020), generative adversarial networks (GANs) (Gupta et al 2018;Sadeghian et al 2019;Kosaraju et al 2019;Li 2019;Dendorfer, Elflein, and Leal-Taixé 2021) and the Conditional Variational Auto-encoder (CVAE) (Lee et al 2017;Ivanovic and Pavone 2019;Salzmann et al 2020;Chen et al 2021b;Yao et al 2021;Xu et al 2022a;Wang et al 2022;Yue, Manocha, and Wang 2022;Xu, Hayet, and Karamouzas 2022;Wen, Wang, and Metaxas 2022) are proposed to infer socially-acceptable multiple trajectories.…”