Supratik Paul scite author profile

Supratik Paul

5Publications

55Citation Statements Received

80Citation Statements Given

How they've been cited

How they cite others

Affiliations

Nomor Research (Germany), University of Oxford

Publications

Order By: Most citations

Learning From Demonstration in the Wild

Behbahani

Shiarlis

Chen

et al. 2019

View full text Add to dashboard Cite

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.

show abstract

Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving

Bronstein

Palatucci

Notz

et al. 2022

View full text Add to dashboard Cite

Learning from Demonstration in the Wild

Behbahani¹,

Shiarlis²,

Chen³

et al. 2018

Preprint

View full text Add to dashboard Cite

Fast Efficient Hyperparameter Tuning for Policy Gradients

Paul¹,

Kurin²,

Whiteson³

2019

Preprint

View full text Add to dashboard Cite

Alternating Optimisation and Quadrature for Robust Control

Paul¹,

Chatzilygeroudis

Ciosek³

et al. 2018

AAAI

View full text Add to dashboard Cite

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This paper considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy. Experimental results across different domains show that ALOQ can learn more efficiently and robustly than existing methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Supratik Paul

Learning From Demonstration in the Wild

Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving

Learning from Demonstration in the Wild

Fast Efficient Hyperparameter Tuning for Policy Gradients

Alternating Optimisation and Quadrature for Robust Control

Contact Info

Product

Resources

About