“…Another line of work, named Deep V-Learning, first uses supervised learning and then RL to learn a value function for path planning based on known state transitions of all agents [8], [11], [12], [20]. To remove assumptions on state transitions, decentralized structural-RNN (DS-RNN) uses model-free RL to train the robot policy from scratch with RL [13]. To model the interactions between the robot and humans, these RL-based methods use long short-term memory (LSTM) encoders, attention mechanisms, and spatio-temporal graphs.…”