The 2013 International Joint Conference on Neural Networks (IJCNN) 2013
DOI: 10.1109/ijcnn.2013.6707036
|View full text |Cite
|
Sign up to set email alerts
|

Designing multi-objective multi-armed bandits algorithms: A study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
78
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 88 publications
(79 citation statements)
references
References 7 publications
1
78
0
Order By: Relevance
“…In the multi-objective case, i.e., a multi-objective multi-armed bandit problem (MOBP) (Drugan and Nowé, 2013), we typically need more than one policy, and deterministic policies no longer suffice.…”
Section: Bandit Problemsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the multi-objective case, i.e., a multi-objective multi-armed bandit problem (MOBP) (Drugan and Nowé, 2013), we typically need more than one policy, and deterministic policies no longer suffice.…”
Section: Bandit Problemsmentioning
confidence: 99%
“…However, the learning setting is beyond the scope of this dissertation. Please refer to (Auer and Ortner, 2010;Kuleshov and Precup, 2014) for an overview of BP learning algorithms, and to (Drugan and Nowé, 2013;Yahyaa et al, 2014) for MOBP algorithms.…”
mentioning
confidence: 99%
“…The multi-objective multi-armed bandits (MOMAB) [Drugan and Nowe, 2013] algorithms are MABs with reward vectors that import techniques from multi-objective optimisation for an efficient exploration / exploitation trade-off. There are important differences between the MOMAB and the standard MAB algorithms that arise mainly because: (1) there are sets of arms that can be considered to be the best, i.e.…”
Section: Introductionmentioning
confidence: 99%
“…The well known exploration/exploitation trade-off plays a very important role in this problem [11]. These methods have also used both the Pareto dominance relation [12] and the scalarization functions [13] to identify the Pareto front. Some multi-objective reinforcement learning algorithms use the lexicographical order relation [14] that assumes that one objective is more important than another objective.…”
Section: Introductionmentioning
confidence: 99%