Gábor Orosz scite author profile

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

show abstract

Traffic jams: dynamics and control

Orosz

Wilson

Stépàn

2010

Phil. Trans. R. Soc. A.

350

215

View full text Add to dashboard Cite

This introductory paper reviews the current state-of-the-art scientific methods used for modelling, analysing and controlling the dynamics of vehicular traffic. Possible mechanisms underlying traffic jam formation and propagation are presented from a dynamical viewpoint. Stable and unstable motions are described that may give the skeleton of traffic dynamics, and the effects of driver behaviour are emphasized in determining the emergent state in a vehicular system. At appropriate points, references are provided to the papers published in the corresponding Theme Issue.Keywords: vehicular traffic; congestion; stop-and-go waves; Hopf bifurcation; driver reaction time; unstable waves I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, 'How many arbitrary parameters did you use for your calculations?' I thought for a moment about our cut-off procedures and said, 'Four'. He said, 'I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk '. (Dyson 2004, p. 297) Background and challengesThe introduction of the assembly line in the automotive industry about a century ago allowed the mass production of automobiles, which, in turn, revolutionized land transportation. At the same time, a problem was also generated that has not yet been resolved: traffic congestion. Since then, researchers from many different disciplines (mathematics, physics and engineering) have targeted this problem, *Author for correspondence (gabor@engineering.ucsb.edu).One contribution of 10 to a Theme Issue 'Traffic jams: dynamics and control'.This journal is © 2010 The Royal Society 4455 on May 10, 2018 http://rsta.royalsocietypublishing.org/ Downloaded from 4456 G. Orosz et al. often using sophisticated mathematical tools brought from their own area of expertise. Also, analogies between traffic flow and other flows (fluid flow, gas flow and granular flow) were established. Although such analogies may help scientists to gain understanding of vehicular systems, it is becoming more and more obvious that traffic flows like no other flow in the Newtonian universe.To date, a vast number of different models have been constructed, but still no first principles have been established to guide the modelling procedure (if such principles exist at all). In many cases, authors claimed that the developed model described traffic better than models prior to that point, and such claims were often justified by fitting the models to empirical data. The quote at the beginning of this paper tries to illustrate, without questioning the importance of any specific model, that the above approach may easily lead to research capturing, but also missing, some essential characteristics. We believe that another way to conduct research in traffic can be by studying general classes of models and classifying their qualitative dynamical features (including 'hidden' unstable motions) when varying model parameters. To...

show abstract

Dynamics of connected vehicle systems with delayed acceleration feedback

Orosz

2014

Transportation Research Part C: Emerging Technologies

392

129

View full text Add to dashboard Cite

Dynamics on Networks of Cluster States for Globally Coupled Phase Oscillators

Ashwin¹,

Orosz²,

Wordsworth³

et al. 2007

SIAM J. Appl. Dyn. Syst.

127

View full text Add to dashboard Cite

Abstract. Systems of globally coupled phase oscillators can have robust attractors that are heteroclinic networks. We investigate such a heteroclinic network between partially synchronized states where the phases cluster into three groups. For the coupling considered there exist 30 different three-cluster states in the case of five oscillators. We study the structure of the heteroclinic network and demonstrate that it is possible to navigate around the network by applying small impulsive inputs to the oscillator phases. This paper shows that such navigation may be done reliably even in the presence of noise and frequency detuning, as long as the input amplitude dominates the noise strength and the detuning magnitude, and the time between the applied pulses is in a suitable range. Furthermore, we show that, by exploiting the heteroclinic dynamics, frequency detuning can be encoded as a spatiotemporal code. By changing a coupling parameter we can stabilize the three-cluster states and replace the heteroclinic network by a network of excitable three-cluster states. The resulting "excitable network" has the same structure as the heteroclinic network and navigation around the excitable network is also possible by applying large impulsive inputs. We also discuss features that have implications for related models of neural activity.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gábor Orosz

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Traffic jams: dynamics and control

Dynamics of connected vehicle systems with delayed acceleration feedback

Dynamics on Networks of Cluster States for Globally Coupled Phase Oscillators

Contact Info

Product

Resources

About