SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

Sadeghian, Amir; Kosaraju, Vineet; Sadeghian, Ali; Hirose, Noriaki; Rezatofighi, Hamid; Savarese, Silvio

doi:10.1109/cvpr.2019.00144

Cited by 877 publications

(864 citation statements)

References 37 publications

Supporting

Mentioning

862

Contrasting

Unclassified

Order By: Relevance

“…Social GAN [8] Sophie [10] Desire [12] Ours Linear Regressor Social Forces [33] Social LSTM [5] CAR-NET [ Fig. 4.…”

Section: Methodsmentioning

confidence: 99%

“…In addition to location co-ordinates, some approaches also incorporate auxiliary information such as the head pose of pedestrians [9], [14] while encoding past motion. Many approaches jointly model the past motion of multiple agents in the scene to capture interaction between agents [5], [15], [12], [10], [7], [11]. This is typically done by pooling the RNN states of individual agents in a social tensor [5], [12], [11], using graph neural networks [16] or by modeling pairwise distances between agents along with max pooling [8], [10], [7].…”

Section: Related Studiesmentioning

confidence: 99%

See 1 more Smart Citation

Scene Compliant Trajectory Forecast With Agent-Centric Spatio-Temporal Grids

Ridel

Deo

Wolf

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Forecasting long-term human motion is a challenging task due to the non-linearity, multi-modality and inherent uncertainty in future trajectories. The underlying scene and past motion of agents can provide useful cues to predict their future motion. However, the heterogeneity of the two inputs poses a challenge for learning a joint representation of the scene and past trajectories. To address this challenge, we propose a model based on grid representations to forecast agent trajectories. We represent the past trajectories of agents using binary 2-D grids, and the underlying scene as a RGB birds-eye view (BEV) image, with an agent-centric frame of reference. We encode the scene and past trajectories using convolutional layers and generate trajectory forecasts using a Convolutional LSTM (ConvLSTM) decoder. Results on the publicly available Stanford Drone Dataset (SDD) show that our model outperforms prior approaches and outputs realistic future trajectories that comply with scene structure and past motion.

show abstract

“…Social GAN [8] Sophie [10] Desire [12] Ours Linear Regressor Social Forces [33] Social LSTM [5] CAR-NET [ Fig. 4.…”

Section: Methodsmentioning

confidence: 99%

Section: Related Studiesmentioning

confidence: 99%

Scene Compliant Trajectory Forecast With Agent-Centric Spatio-Temporal Grids

Ridel

Deo

Wolf

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…Social GAN [35] defines a spatial pooling for motion prediction. SoPhie [18] introduces an attentive GAN to predict individual trajectories leveraging the physical Constraints. Although obtained impressive results, these top-down methods pose limitations that make them inapplicable to egocentric applications of selfdriving scenarios.…”

Section: Related Workmentioning

confidence: 99%

“…Although they have obtained convincing performance on several benchmarks, completion of such tasks is not enough for human-like driving. On the other hand, trajectory prediction [11]- [18] addresses the problem to some extent by predicting the potential future position of the pedestrian. But predicting trajectories with high confidence long-enough into the future is a very challenging task as many different and subtle factors change the trajectories of pedestrians.…”

Section: Introductionmentioning

confidence: 99%

Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction

Liu

Adeli

Cao

et al. 2020

IEEE Robot. Autom. Lett.

141

View full text Add to dashboard Cite

Reasoning over visual data is a desirable capability for robotics and vision-based applications. Such reasoning enables forecasting the next events or actions in videos. In recent years, various models have been developed based on convolution operations for prediction or forecasting, but they lack the ability to reason over spatiotemporal data and infer the relationships of different objects in the scene. In this paper, we present a framework based on graph convolution to uncover the spatiotemporal relationships in the scene for reasoning about pedestrian intent. A scene graph is built on top of segmented object instances within and across video frames. Pedestrian intent, defined as the future action of crossing or not-crossing the street, is very crucial piece of information for autonomous vehicles to navigate safely and more smoothly. We approach the problem of intent prediction from two different perspectives and anticipate the intention-to-cross within both pedestrian-centric and location-centric scenarios. In addition, we introduce a new dataset designed specifically for autonomousdriving scenarios in areas with dense pedestrian populations: the Stanford-TRI Intent Prediction (STIP) dataset. Our experiments on STIP and another benchmark dataset show that our graph modeling framework is able to predict the intention-to-cross of the pedestrians with an accuracy of 79.10% on STIP and 79.28% on Joint Attention for Autonomous Driving (JAAD) dataset up to one second earlier than when the actual crossing happens. These results outperform baseline and previous work. Please refer to http://stip.stanford.edu/ for the dataset and code.Index Terms-spatiotemporal graphs, forecasting, graph neural networks, autonomous-driving. Recent work [19]-[23] introduced pedestrian intent prediction and have typically tackled the problem by observing pedestrian-specific features such as location, velocity, and

show abstract

“…For example, sequence models that use Long Short Term Memory (LSTM) recurrent neural networks like Social LSTM [1] and other [12,13] are capable to encode the Human-Robot interactions and Human-Human interactions to improve the predictions. Other techniques are based in generative models like Social-GAN [11] or SoPhie [21] that use pooling modules and attention modules. This type of techniques offers very useful information in navigation tasks but not take into account all the elements in navigation tasks like obstacles, actions, kinematics or goals.…”

Section: Navigation Based On Deep Reinforcement Learningmentioning

confidence: 99%

Effects of a Social Force Model Reward in Robot Navigation Based on Deep Reinforcement Learning

Viyuela¹,

Sanfeliu²

2019

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

In this paper is proposed an inclusion of the Social Force Model (SFM) into a concrete Deep Reinforcement Learning (RL) framework for robot navigation. These types of techniques have demonstrated to be useful to deal with different types of environments to achieve a goal. In Deep RL, a description of the world to describe the states and a reward adapted to the environment are crucial elements to get the desire behaviour and achieve a high performance. For this reason, this work adds a dense reward function based on SFM and uses the forces in the states like an additional description. Furthermore, obstacles are added to improve the behaviour of works that only consider moving agents. This SFM inclusion can offer a better description of the obstacles for the navigation. Several simulations have been done to check the effects of these modifications in the average performance.

show abstract

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

Cited by 877 publications

References 37 publications

Scene Compliant Trajectory Forecast With Agent-Centric Spatio-Temporal Grids

Scene Compliant Trajectory Forecast With Agent-Centric Spatio-Temporal Grids

Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction

Effects of a Social Force Model Reward in Robot Navigation Based on Deep Reinforcement Learning

Contact Info

Product

Resources

About