A two-armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade-off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better-informed decisions at later stages (ie, maximizing long-term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break-even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy. KEYWORDS bandit processes, Bayesian method, Gittins index, Markov decision processes, optimal strategy 624