Learning Decoupled Representations for Human Pose Forecasting

Parsaeifard, Behnam; Saadatnejad, Saeed; Liu, Yuejiang; Mordan, Taylor; Alahi, Alexandre

doi:10.1109/iccvw54120.2021.00259

Cited by 16 publications

(12 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Results of SPADE baseline [16] are obtained by re-training their model with their hyper-parameters publicly available. 2 c) Qualitative results: Qualitative results are shown in Figure 3 for Cistyscapes and in Figure 4 for CMP Facades. Having the semantic matching head and different feature maps, each focusing on a specific object, could generate more semantically consistent details, e.g., the windows and balconies are less blurry and with more details for the facades.…”

Section: Methodsmentioning

confidence: 99%

“…Take the joint distribution of training data as p * (x, s), the goal is to find an approximate joint distribution p θ (x, s). The full objective function was defined in Equation (1) and Equation (2). For simplicity, we ignore the reconstruction loss here and therefore the objective function is as follows:…”

Section: E Stabilizing the Trainingmentioning

confidence: 99%

“…S AFETY is the primary concern when developing autonomous vehicles (AVs). For example, a wrong action in an unexpected situation can lead to a collision with a pedestrian, which is not negligible [1], [2]. Yet, strictly evaluating AVs in the real world is not a realistic nor a safe option.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Shared Representation for Photorealistic Driving Simulators

Saadatnejad¹,

Liu²,

Mordan³

et al. 2022

IEEE Trans. Intell. Transport. Syst.

Self Cite

View full text Add to dashboard Cite

A powerful simulator highly decreases the need for real-world tests when training and evaluating autonomous vehicles. Data-driven simulators flourished with the recent advancement of conditional Generative Adversarial Networks (cGANs), providing high-fidelity images. The main challenge is synthesizing photorealistic images while following given constraints. In this work, we propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We build on successful cGAN models to propose a new semanticallyaware discriminator that better guides the generator. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning. The achieved improvements are generic and simple enough to be applied to any architecture of conditional image synthesis. We demonstrate the strength of our method on the scene, building, and human synthesis tasks across three different datasets. The code is available https://github.com/vita-epfl/SemDisc.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: E Stabilizing the Trainingmentioning

confidence: 99%

See 1 more Smart Citation

A Shared Representation for Photorealistic Driving Simulators

Saadatnejad¹,

Liu²,

Mordan³

et al. 2022

IEEE Trans. Intell. Transport. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In these models, each joint is represented as a node and each relation between joints as an edge. In the literature, this problem is still handled in single-modal setting, despite a recent attempt to better consider randomness [49].…”

Section: Multiple Trajectory Prediction In Roboticsmentioning

confidence: 99%

Deep variational learning for multiple trajectory prediction of 360° head movements

Guimard

Sassatelli

Marchetti

et al. 2022

Proceedings of the 13th ACM Multimedia Systems Conference

View full text Add to dashboard Cite

Prediction of head movements in immersive media is key to design efficient streaming systems able to focus the bandwidth budget on visible areas of the content. Numerous proposals have therefore been made in the recent years to predict 360°images and videos. However, the performance of these models is limited by a main characteristic of the head motion data: its intrinsic uncertainty. In this article, we present an approach to generate multiple plausible futures of head motion in 360°videos, given a common past trajectory. Our method provides likelihood estimates of every predicted trajectory, enabling direct integration in streaming optimization. To the best of our knowledge, this is the first work that considers the problem of multiple head motion prediction for 360°video streaming. We first quantify this uncertainty from the data. We then introduce our discrete variational multiple sequence (DVMS) learning framework, which builds on deep latent variable models. We design a training procedure to obtain a flexible and lightweight stochastic prediction model compatible with sequence-to-sequence recurrent neural architectures. Experimental results on 3 different datasets show that our method DVMS outperforms competitors adapted from the selfdriving domain by up to 37% on prediction horizons up to 5 sec., at lower computational and memory costs. Finally, we design a method to estimate the respective likelihoods of the multiple predicted trajectories, by exploiting the stationarity of the distribution of the prediction error over the latent space. Experimental results on 3 datasets show the quality of these estimates, and how they depend on the video category. CCS CONCEPTS• Human-centered computing → Virtual reality; • Information systems → Multimedia streaming; • Computing methodologies → Neural networks.

show abstract

“…The scene plays an important role in vehicle trajectory prediction as it constrains the future positions of the agents. Therefore, modeling the scene is common in spite of some human trajectory prediction models [13,39]. In order to reason over the scene in the predictions, some suggested using a semantic segmented map to build circular distributions and outputting the most probable regions [21].…”

Section: Related Workmentioning

confidence: 99%

Vehicle trajectory prediction works, but not everywhere

Bahari¹,

Saadatnejad²,

Rahimi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Vehicle trajectory prediction is nowadays a fundamental pillar of self-driving cars. Both the industry and research communities have acknowledged the need for such a pillar by running public benchmarks. While state-of-the-art methods are impressive, i.e., they have no off-road prediction, their generalization to cities outside of the benchmark is unknown. In this work, we show that those methods do not generalize to new scenes. We present a novel method that automatically generates realistic scenes that cause state-ofthe-art models go off-road. We frame the problem through the lens of adversarial scene generation. We promote a simple yet effective generative model based on atomic scene generation functions along with physical constraints. Our experiments show that more than 60% of the existing scenes from the current benchmarks can be modified in a way to make prediction methods fail (predicting off-road). We further show that (i) the generated scenes are realistic since they do exist in the real world, and (ii) can be used to make existing models robust by 30-40%. Code is available at https://s-attack.github.io/.

show abstract

Learning Decoupled Representations for Human Pose Forecasting

Cited by 16 publications

References 43 publications

A Shared Representation for Photorealistic Driving Simulators

A Shared Representation for Photorealistic Driving Simulators

Deep variational learning for multiple trajectory prediction of 360° head movements

Vehicle trajectory prediction works, but not everywhere

Contact Info

Product

Resources

About