A Theory of State Abstraction for Reinforcement Learning

Abel, David

doi:10.1609/aaai.v33i01.33019876

Cited by 25 publications

(19 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Abstraction in DRL has gained more attention in recent years. Abel presented a theory of abstraction for DRL in his dissertation and concluded that learning on abstraction can be more efficient while preserving near-optimal behaviors [1]. Abel's abstraction theory is focused on the systems with finite state space for learning efficiency.…”

Section: Abstraction and State Discretization In Drlmentioning

confidence: 99%

“…Provided that a set of properties are predefined for a target DRL system to develop, our framework trains the system and verifies it against the properties in every iteration. To overcome the verification challenges in DRL systems, for the first time, we propose a novel approach in our framework to train the systems on a finite set of abstract states, based on the observation that approximate abstractions can still preserve near-optimal behavior [1]. These states are the abstractions of the actual states.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Trainify: A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning

Peng

Jiaxu

Zhi

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Deep Reinforcement Learning (DRL) has demonstrated its strength in developing intelligent systems. These systems shall be formally guaranteed to be trustworthy when applied to safety-critical domains, which is typically achieved by formal verification performed after training. This train-then-verify process has two limits: (i) trained systems are difficult to formally verify due to their continuous and infinite state space and inexplicable AI components (i.e., deep neural networks), and (ii) the ex post facto detection of bugs increases both the time- and money-wise cost of training and deployment. In this paper, we propose a novel verification-in-the-loop training framework called Trainify for developing safe DRL systems driven by counterexample-guided abstraction and refinement. Specifically, Trainify trains a DRL system on a finite set of coarsely abstracted but efficiently verifiable state spaces. When verification fails, we refine the abstraction based on returned counterexamples and train again on the finer abstract states. The process is iterated until all predefined properties are verified against the trained system. We demonstrate the effectiveness of our framework on six classic control systems. The experimental results show that our framework yields more reliable DRL systems with provable guarantees without sacrificing system performance such as cumulative reward and robustness than conventional DRL approaches.

show abstract

Section: Abstraction and State Discretization In Drlmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Trainify: A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning

Peng

Jiaxu

Zhi

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Indeed, algorithms like MuZero and its predecessors [Silver et al, 2017, Oh et al, 2017, Schrittwieser et al, 2020 never approximate reward functions and transition models with respect to the raw image observations generated by the environment, but instead incrementally learn some latent representation of state upon which a corresponding model is approximated for planning. This philosophy is born out of several years of work that elucidate the important of state abstraction as a key tool for avoiding the irrelevant information encoded in environment states and addressing the challenge of generalization for sample-efficient reinforcement learning large-scale environments [Whitt, 1978, Bertsekas and Castañon, 1989, Dean and Givan, 1997, Ferns et al, 2004, Jong and Stone, 2005, Li et al, 2006, Van Roy, 2006, Ferns et al, 2012, Jiang et al, 2015, Abel et al, 2016, 2018, Dong et al, 2019, Du et al, 2019, Arumugam and Van Roy, 2020, Misra et al, 2020, Agarwal et al, 2020, Abel et al, 2020, Abel, 2020, Dong et al, 2021. In this section, we briefly introduce a small extension of VSRL that builds on these insights to accommodate lossy MDP compressions defined on a simpler, abstract state space (also referred to as aleatoric or situational state by Lu et al [2021], Dong et al [2021]).…”

Section: Greater Compression Via State Abstractionmentioning

confidence: 99%

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arumugam¹,

Roy²

2022

Preprint

View full text Add to dashboard Cite

The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationallybounded agent interacting with an overwhelmingly complex environment, whose underlying dynamics likely exceed the agent's capacity for representation. In this work, we consider the scenario where agent limitations may entirely preclude identifying an exactly value-equivalent model, immediately giving rise to a trade-off between identifying a model that is simple enough to learn while only incurring bounded sub-optimality. To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity.

show abstract

“…Abstract representation of action was useful for our HRL models only because of a third component we identified as necessary for immediate acquisition of novel behaviours: state abstraction (Abel, 2019;Andre & Russell, 2002;Botvinick et al, 2009;Radulescu, Niv, & Ballard, 2019). We allowed our most complex HRL model (model 4: abstract hierarchical) to generalise whatever it learned from one context to other relevant contexts.…”

Section: State Abstractionmentioning

confidence: 99%

Assembling hierarchies of action using sequencing and abstraction: studies and models of zero-shot learning

Williams

Palminteri

Haggard

2022

Preprint

View full text Add to dashboard Cite

Although the hierarchical structure of human action is widely acknowledged, we do not fully understand how hierarchies of action are assembled. The standard view is that low-level actions are sequenced to establish higher-level routines of behaviour. Here we develop an alternative approach to building hierarchies, based on two insights. First, we consider relations between sequence elements. Second, we identify abstract features common to several such relations, and show how these abstract features allow for flexible action sequence learning. We combine sequencing and abstraction within a single model of hierarchical structure and test this model in two distinct versions of a novel experimental paradigm. We demonstrate that humans can learn entirely novel sequences of actions without practice, by generalising learned sequence structures from one context to another. Computational modelling showed that this ‘zero-shot learning’ of novel behaviours was successfully captured by a hierarchical organisation of the kind we propose.

show abstract

A Theory of State Abstraction for Reinforcement Learning

Cited by 25 publications

References 6 publications

Trainify: A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning

Trainify: A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Assembling hierarchies of action using sequencing and abstraction: studies and models of zero-shot learning

Contact Info

Product

Resources

About