2022
DOI: 10.3390/ai3020015
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Abstract: The increased complexity of state-of-the-art reinforcement learning (RL) algorithms has resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post hoc explainability methods that aim to extract information from learned policies, thus aiding explainability. These methods rely on empirical observations of the policy, and thus aim to generalize a characterization of agents’ behaviour. In this study, we have instead developed a method to imbue agents’ pol… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

5
3

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…Interpretation of RL agents typically follows model training [26,27,28]; our ambition is to impose a desired characteristic behaviour during training, thus making it an intrinsic property of the agent. Based on a prior that defines a desired behaviour, we extend the deep deterministic policy gradient (DDPG [29]) objective function with a regularisation term [15]. Formally, for each agent i, this objective function is given by:…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Interpretation of RL agents typically follows model training [26,27,28]; our ambition is to impose a desired characteristic behaviour during training, thus making it an intrinsic property of the agent. Based on a prior that defines a desired behaviour, we extend the deep deterministic policy gradient (DDPG [29]) objective function with a regularisation term [15]. Formally, for each agent i, this objective function is given by:…”
Section: Related Workmentioning
confidence: 99%
“…We have previously investigated the interpretability of systems of multiple RL agents [15]. A regularisation term in the objective function imposed a desired agent behaviour during training.…”
Section: Introductionmentioning
confidence: 99%
“…Policy regularization has been shown to be helpful and never detrimental to convergence [21]. Although most policy regularization methods aim to improve learning performance, they can also control the learning process and imbue the policy with an intrinsic behavior [22]. Here, the objective function is regularized with a predefined prior action distribution that defines a desirable characteristic:…”
Section: Background and Related Workmentioning
confidence: 99%
“…These agents learned to either prefer left turns, right turns, or to avoid going straight by taking a zig-zag approach to their destination. In contrast to constrained RL, which avoids certain states, the policy regularization in [22] encourages certain actions irrespective of the state and is a new direction for RL.…”
Section: Background and Related Workmentioning
confidence: 99%
“…We distinguish between explainability and interpretability: explainability refers to a symbolic representation of the knowledge a model has learned, while interpretability is necessary for reasoning about a model's predictions. We have previously investigated the interpretability of systems of multiple RL agents [15]: a regularisation term in the objective function instilled a desired agent behaviour during training. For our current purpose of prosperity management, we create prototypical RL agents which have intrinsic affinities for certain asset classes.…”
Section: Introductionmentioning
confidence: 99%