Dynamic pricing seeks to determine the most optimal selling price for a product or service, taking into account factors like limited supply and uncertain demand. This study aims to provide a comprehensive exploration of dynamic pricing using the multi-armed bandit problem framework in various contexts. The investigation highlights the prevalence of Thompson sampling in dynamic pricing scenarios with a Bayesian backdrop, where the seller possesses prior knowledge of demand functions. On the other hand, in non-Bayesian situations, the Upper Confidence Bound (UCB) algorithm family gains traction due to their favorable regret bounds. As markets often exhibit temporal fluctuations, the domain of non-stationary multi-armed bandits within dynamic pricing emerges as crucial. Future research directions include enhancing traditional multi-armed bandit algorithms to suit online learning settings, especially those involving dynamic reward distributions. Additionally, merging prior insights into demand functions with contextual multi-armed bandit approaches holds promise for advancing dynamic pricing strategies. In conclusion, this study sheds light on dynamic pricing through the lens of multi-armed bandit problems, offering insights and pathways for further exploration.