2018
DOI: 10.1007/s00186-018-0653-1
|View full text |Cite
|
Sign up to set email alerts
|

Computation of weighted sums of rewards for concurrent MDPs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(24 citation statements)
references
References 32 publications
0
24
0
Order By: Relevance
“…Then, we modify the definition of the set of informative MECs C I (M I ) in ( 8) to be (10) In (10), we require the informative MECs in C I (M I ) to contain at least one informative state-action pair from each set ISA ij .…”
Section: B Base Case: No Identity-revealing Transitionsmentioning
confidence: 99%
See 3 more Smart Citations
“…Then, we modify the definition of the set of informative MECs C I (M I ) in ( 8) to be (10) In (10), we require the informative MECs in C I (M I ) to contain at least one informative state-action pair from each set ISA ij .…”
Section: B Base Case: No Identity-revealing Transitionsmentioning
confidence: 99%
“…Multi-model MDPs: In the literature, there are several names for the model considered in this paper: hidden model MDPs [6], multi-task reinforcement learning [7], multipleenvironment MDPs [8], contextual MDPs [9], multi-scenario MDPs and concurrent MDPs [10], latent MDPs [11], and multi-model MDPs [2]. The authors in [6] model the adaptive management problems in conservation biology and natural resources management using a hidden model MDP.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Note that reinforcement learning for latent mixture environments here is different from Markov decision processes(MDPs) for non-stationary environments [24], [25], decentralized partially observable Markov decision process (Dec-POMDP) [26], and multi-model Markov decision pro-cesses [27], [28]. For non-stationary environments, both reward functions and state transition distributions are allowed to change with time for a trajectory.…”
Section: Introductionmentioning
confidence: 99%