Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Friedman, Eli A.; Fontaine, Fred L.

doi:10.48550/arxiv.1809.06364

Cited by 7 publications

(17 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use a generalized, intrinsically Multi-Objective RL strategy for stock and cryptocurrency trading. We implement this by considering extensions of Multi-Objective Deep Q-Learning RL algorithm with experience replay and target network stabilization given in [6], and deploying it on the Nifty50 stock index and BTCUSD trading pair.…”

Section: Our Contributionmentioning

confidence: 99%

“…To the very best of our knowledge, ours is first application of Multi-Reward RL in the sense of [6] to financial data.…”

Section: Related Workmentioning

confidence: 99%

“…We consider variations of the Deep-Q-Learning algorithm with Hindsight Experience Replay and target network stabilization [14] (DQN-HER) for both standard Single-reward or Multi-reward structure (in the sense of [6]) applied to single asset datasets.…”

Section: Abstract Definition Of the Modelmentioning

confidence: 99%

See 2 more Smart Citations

Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for selected applications in stock and cryptocurrency trading

Cornalba¹,

Disselkamp²,

Scassola³

et al. 2022

Preprint

View full text Add to dashboard Cite

We investigate the potential of Multi-Objective, Deep Reinforcement Learning for stock and cryptocurrency trading. More specifically, we build on the generalized setting à la Fontaine and Friedman [6] (where the reward weighting mechanism is not specified a priori, but embedded in the learning process) by complementing it with computational speed-ups, and adding the cumulative reward's discount factor to the learning process.Firstly, we verify that the resulting Multi-Objective algorithm generalizes well, and we provide preliminary statistical evidence showing that its prediction is more stable than the corresponding Single-Objective strategy's. Secondly, we show that the Multi-Objective algorithm has a clear edge over the corresponding Single-Objective strategy when the reward mechanism is sparse (i.e., when non-null feedback is infrequent over time). Finally, we discuss the generalization properties of the discount factor.The entirety of our code is provided in open source format. CCS CONCEPTS• Computing methodologies → Reinforcement learning; Neural networks; Multi-task learning; • Applied computing → Economics.

show abstract

Section: Our Contributionmentioning

confidence: 99%

“…To the very best of our knowledge, ours is first application of Multi-Reward RL in the sense of [6] to financial data.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for selected applications in stock and cryptocurrency trading

Cornalba¹,

Disselkamp²,

Scassola³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In the last couple of years, interest in Deep MORL has intensified, although primarily in single-agent settings (see e.g. [1,33,47,70,74,82,106,111,112]). Very recently, single-objective multi-agent RL has received considerable attention as well [30,32,39,55,97,81,109,130].…”

Section: Deep Multi-objective Multi-agent Decision Makingmentioning

confidence: 99%

Multi-objective multi-agent decision making: a utility-based analysis and survey

Rădulescu

Mannion

Roijers

et al. 2019

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multiobjective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should be analysed on the basis of the utility that these compromises have for the users of a system. As is standard in multi-objective optimisation, we model the user utility using utility functions that map value or return vectors to scalar values. This approach naturally leads to two different optimisation criteria: expected scalarised returns (ESR) and scalarised expected returns (SER). We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied. This allows us to offer a structured view of the field, to clearly delineate the current state-of-the-art in multi-objective multi-agent decision making approaches and to identify promising directions for future research. Starting from the execu-

show abstract

“…For our MORL agent we implement a vanilla DQN as described by Mnih et al (2015), although our method is easily applicable to most RL algorithms. Recent advances in MTRL, such as the use of UVFAs for generalizing across goals, has become more common in multiple objective settings (Friedman & Fontaine, 2018;Abels et al, 2018). We also utilize UVFAs to generalize across As the agent gains experience, tuples of state, action, next state, terminal, and reward vector (s, a, s , t, r) are stored in a replay buffer for future training.…”

Section: Multi-objective Dqnmentioning

confidence: 99%

Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

Nottingham¹,

Balakrishnan²,

Deshmukh³

et al. 2019

Preprint

View full text Add to dashboard Cite

In the multi-objective reinforcement learning (MORL) paradigm, the relative importance of each environment objective is often unknown prior to training, so agents must learn to specialize their behavior to optimize different combinations of environment objectives that are specified post-training. These are typically linear combinations, so the agent is effectively parameterized by a weight vector that describes how to balance competing environment objectives. However, many real world behaviors require non-linear combinations of objectives. Additionally, the conversion between desired behavior and weightings is often unclear. In this work, we explore the use of a language based on propositional logic with quantitative semantics-in place of weight vectors-for specifying non-linear behaviors in an interpretable way. We use a recurrent encoder to encode logical combinations of objectives, and train a MORL agent to generalize over these encodings. We test our agent in several grid worlds with various objectives and show that our agent can generalize to many never-before-seen specifications with performance comparable to single policy baseline agents. We also demonstrate our agent's ability to generate meaningful policies when presented with novel specifications and quickly specialize to novel specifications. * Author initiated work during an internship at USC.

show abstract

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Cited by 7 publications

References 5 publications

Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for selected applications in stock and cryptocurrency trading

Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for selected applications in stock and cryptocurrency trading

Multi-objective multi-agent decision making: a utility-based analysis and survey

Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

Contact Info

Product

Resources

About