2021
DOI: 10.48550/arxiv.2112.08451
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Quantum Algorithms for Reinforcement Learning with a Generative Model

Abstract: Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a γ-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy (π * ), the optimal value function (v * ), and the optimal Q-function (q * ), assuming the algorithms can access samples from the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…For bandit problems, Casalé et al (Casalé et al 2020) initiated the study of quantum algorithms for best-arm identification of MAB, and Wang et al (Wang et al 2021b) proved optimal results for best-arm identification of MAB with Bernoulli arms. As an extension, Wang et al (Wang et al 2021a) proposed quantum algorithms for finding an optimal policy for a Markov decision process with quantum speedup. These results focused on the exploration of reinforcement learning models, and in terms of the tradeoff between exploration and exploitation, the only work we are aware of is (Lumbreras, Haapasalo, and Tomamichel 2022), which proved that the regret of online learning of properties of quantum states has lower bounds Ω( √ T ).…”
Section: Introductionmentioning
confidence: 99%
“…For bandit problems, Casalé et al (Casalé et al 2020) initiated the study of quantum algorithms for best-arm identification of MAB, and Wang et al (Wang et al 2021b) proved optimal results for best-arm identification of MAB with Bernoulli arms. As an extension, Wang et al (Wang et al 2021a) proposed quantum algorithms for finding an optimal policy for a Markov decision process with quantum speedup. These results focused on the exploration of reinforcement learning models, and in terms of the tradeoff between exploration and exploitation, the only work we are aware of is (Lumbreras, Haapasalo, and Tomamichel 2022), which proved that the regret of online learning of properties of quantum states has lower bounds Ω( √ T ).…”
Section: Introductionmentioning
confidence: 99%
“…For bandit problems, Casalé et al [9] initiated the study of quantum algorithms for best-arm identification of MAB, and Wang et al [26] proved optimal results for best-arm identification of MAB with Bernoulli arms. As an extension, Wang et al [25] proposed quantum algorithms for finding an optimal policy for a Markov decision process with quantum speedup. These results focused on exploration of reinforcement learning models, and in terms of the tradeoff between exploration and exploitation, the only work we are aware of is [18], which proved that the regret of online learning of properties of quantum states has lower bounds Ω( √ T ).…”
Section: Introductionmentioning
confidence: 99%