2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) 2020
DOI: 10.1109/icdl-epirob48136.2020.9278062
|View full text |Cite
|
Sign up to set email alerts
|

The Need for MORE: Need Systems as Non-Linear Multi-Objective Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(28 citation statements)
references
References 30 publications
0
28
0
Order By: Relevance
“…Fig. 2: Utility concepts for the multi-objective problem: (i) standard linear scalarization [8], (ii) ranking, in which the first objective always dominates the second one until a threshold of c 0 = 2 is reached, (iii) the non-linear MORE scalarization that corresponds to a softmin function [8], (iv) a MORE scalarization that is shifted by c 0 = 2 on the first objective, giving it higher priority, but also continuously weighting in the second objective when the first one becomes satisfied.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Fig. 2: Utility concepts for the multi-objective problem: (i) standard linear scalarization [8], (ii) ranking, in which the first objective always dominates the second one until a threshold of c 0 = 2 is reached, (iii) the non-linear MORE scalarization that corresponds to a softmin function [8], (iv) a MORE scalarization that is shifted by c 0 = 2 on the first objective, giving it higher priority, but also continuously weighting in the second objective when the first one becomes satisfied.…”
Section: Methodsmentioning
confidence: 99%
“…For example the ranking approach [6] makes use of thresholds and objective ordering to encourage the agent to optimize objectives one at a time. The "Multi-Objective Reward Exponentials" (MORE) framework [8], on the other hand, takes a more dynamic and continuous approach whereby the accumulated past rewards for all the objectives are actively used for dynamically weighting the future values of a given objective. The values of each objective are transformed according to an exponential function which acts as a deficit model, and makes the agent focus on the least achieved objective.…”
Section: Multi-objective Reinforcement Learningmentioning
confidence: 99%
See 3 more Smart Citations