Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

Santosh, Sandeep; Darak, Sumit J.

doi:10.1109/tnnls.2022.3190509

Cited by 5 publications

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…During the experiment, the ε-Greedy algorithm simplifies the trade-off between exploration and exploitation through a fixed proportion of random exploration but may lack flexibility in dynamic environments [13]. The UCB algorithm [14] balances exploration and exploitation by calculating confidence upper bounds and often achieves better performance, especially when the reward distributions are relatively stable.…”

Section: Discussionmentioning

confidence: 99%

Application and comparative analysis of adaptive strategies in multi-armed bandit algorithms

Zhou

2024

ACE

View full text Add to dashboard Cite

This study explores the application and comparative analysis of adaptive strategies in multi-armed bandit algorithms, specifically focusing on the -greedy algorithm, the Upper Confidence Bound (UCB) algorithm, and the Thompson sampling algorithm. By designing and implementing a series of experiments, the research identifies Thompson sampling as the most effective method, despite its greater reward fluctuation, highlighting its superior adaptability in uncertain environments. The comparative analysis reveals that each algorithm possesses distinct advantages and drawbacks, necessitating a strategic selection based on the specific context of the application. This paper emphasizes the importance of adaptive strategies in optimizing decision-making processes in stochastic settings and underscores the need for further exploration into more sophisticated and optimized adaptive strategies to enhance algorithmic performance and efficiency. Through a meticulous examination of these algorithms, the research contributes valuable insights into the dynamic field of machine learning, offering a foundation for future advancements in adaptive strategy optimization.

show abstract

Section: Discussionmentioning

confidence: 99%

Application and comparative analysis of adaptive strategies in multi-armed bandit algorithms

Zhou

2024

ACE

View full text Add to dashboard Cite

show abstract

“…The UCB algorithm, introduced by Auer et al (2002), is a widely recognized strategy for balancing the exploitation-exploration trade-off in the multiarmed bandit problem. It prioritizes arms with high average rewards and high uncertainty, allowing for efficient exploration of the action space [5].…”

Section: Definition and Developmentmentioning

confidence: 99%

Comparative analysis of the KL-UCB and UCB algorithms: Delving into complexity and performance

2024

ACE

View full text Add to dashboard Cite

This paper embarks on a meticulous comparative exploration of two venerable algorithms often invoked in multi-armed bandit problems: the Kullback-Leibler Upper Confidence Bound (KL-UCB) and the generic Upper Confidence Bound (UCB) algorithms. Initially, a comprehensive discourse is presented, elucidating the definition, evolution, and real-world applications of both algorithms. The crux of the study then shifts to a side-by-side comparison, weighing the regret performance and time complexities when applied to a quintessential movie rating dataset. In the trenches of practical implementations, addressing multi-armed bandit problems invariably demands extensive training. Consequently, even seemingly minor variations in algorithmic complexity can usher in pronounced differences in computational durations and resource utilization. This inherent intricacy prompts introspection: Is the potency of a given algorithm in addressing diverse practical quandaries commensurate with its inherent complexity. By juxtaposing the KL-UCB and UCB algorithms, this study not only highlights their relative merits and demerits but also furnishes insights that could serve as catalysts for further refinement and optimization. The overarching aim is to cultivate an informed perspective, guiding practitioners in choosing or fine-tuning algorithms tailored to specific applications without incurring undue computational overheads.

show abstract

Online-Learning-Based Multi-RIS-Aided Wireless Systems

Sharma,

Kumar,

Darak

2024

IEEE Systems Journal

View full text Add to dashboard Cite

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

Cited by 5 publications

References 18 publications

Application and comparative analysis of adaptive strategies in multi-armed bandit algorithms

Application and comparative analysis of adaptive strategies in multi-armed bandit algorithms

Comparative analysis of the KL-UCB and UCB algorithms: Delving into complexity and performance

Online-Learning-Based Multi-RIS-Aided Wireless Systems

Contact Info

Product

Resources

About