2024
DOI: 10.54097/8mb1rb15
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios

Chang Qu

Abstract: This paper delves into the complexities of the Multi-Armed Bandit (MAB) problem, a fundamental concept in reinforcement learning and probability theory, with a focus on its application in recommendation systems and dynamic fields such as dynamic pricing and investment. It begins by shedding light on the essential paradox at the heart of the MAB problem – the balance between exploration and exploitation within limited parameters. The study primarily centers on Upper Confidence Bound (UCB) policies, especially U… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 7 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?