We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has used low-level signals to predict short time horizons, and has not addressed how to leverage key assets relied upon heavily by industry self-driving systems:(1) large 3D perception efforts which provide highly accurate 3D states of agents with rich attributes, and (2) detailed and accurate semantic maps of the environment (lanes, traffic lights, crosswalks, etc). We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. This enables learning entity-entity and entity-environment interactions with simple, feed-forward computations in each timestep within an overall temporal model of an agent's behavior. We propose different ways of modelling the future as a distribution over future states using standard supervised learning. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior.
We study Thompson sampling (TS) in online decision-making problems where the uncertain environment is sampled from a mixture distribution. This is relevant to multi-task settings, where a learning agent is faced with different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior-dubbed MixTS-and develop a novel, general technique for analyzing the regret of TS with such priors. We apply this technique to derive Bayes regret bounds for MixTS in both linear bandits and tabular Markov decision processes (MDPs). Our regret bounds reflect the structure of the problem and depend on the number of components and confidence width of each component of the prior. Finally, we demonstrate the empirical effectiveness of MixTS in both synthetic and real-world experiments.Preprint. Under review.
No abstract
In the real world, some of the most complex settings for learned agents involve interaction with humans, who often exhibit suboptimal, unpredictable behavior due to sophisticated biases. Agents that interact with people in such settings end up in uencing the actions that these people take. Our goal in this work is to enable agents to leverage that in uence to improve the human's performance in collaborative tasks, as the task unfolds. Unlike prior work, we do not assume online training with people (which tends to be too expensive and unsafe), nor access to a high delity simulator of the environment. Our idea is that by taking a variety of previously observed human-human interaction data and labeling it with the task reward, o line reinforcement learning (RL) can learn to combine components of behavior, and uncover actions that lead to more desirable human actions. First, we show that o line RL can learn strategies to in uence and improve human behavior, despite those strategies not appearing in the dataset, by utilizing components of diverse, suboptimal interactions. In addition, we demonstrate that o line RL can learn in uence that adapts with humans, thus achieving long-term coordination with them even when their behavior changes. We evaluate our proposed method with real people in the Overcooked collaborative benchmark domain, and demonstrate successful improvement in human performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.