“…When looking at humans and predicting their behaviors inside cities a common pipeline is first detecting them in 2D/3D images, then tracking them among consecutive images (video), by assigning a unique identifier, and then finally predicting their future behavior. The behavior prediction task was tackled in the literature in many forms, by classifying among many possible motion patterns GAVRILA, 2013;KOEHLER et al, 2013;BONNIN et al, 2014;VÖLZ et al, 2015;HASHIMOTO et al, 2015b;KWAK;KO;NAM, 2017) by predicting one future trajectory (QUINTERO et al, 2015;GOLDHAMMER et al, 2015;FERGUSON et al, 2015;SCHULZ;STIEFELHAGEN, 2015a), or by predicting many possible trajectories (GUPTA et al, 2018;SADEGHIAN et al, 2019;AMIRIAN;HAYET;PETTRÉ, 2019;LEE et al, 2017;TRIVEDI, 2019;CUI et al, 2019;Zyner;Worrall;Nebot, 2019).…”