We examine a general Bayesian framework for constructing on-line prediction algorithms in the experts setting. These algorithms predict the bits of an unknown Boolean sequence using the advice of a finite set of experts. In this framework we use probabilistic assumptions on the unknown sequence to motivate prediction strategies. However, the relative bounds that we prove on the number of prediction mistakes made by these strategies hold for any sequence. The Bayesian framework provides a unified derivation and analysis of previously known prediction strategies, such as the Weighted Majority and Binomial Weighting algorithms. Furthermore, it provides a principled way of automatically adapting the parameters of Weighted Majority to the sequence, in contrast to previous ad hoc doubling techniques. Finally, we discuss the generalization of our methods to algorithms making randomized predictions.
Introduction.A fundamental problem in learning theory is to predict the bits of an unknown Boolean sequence. The problem is uninteresting when the algorithm is required to minimize its worst-case number of mistakes over all sequences, as no algorithm can do better than random guessing. A richer problem results if the algorithm is given a (finite) set of models and the sequence is reasonably close to that generated by one of the models. Now interesting "relative" mistake bounds that depend on the distance between the unknown Boolean sequence and the closest model can be proven. This is sometimes referred to as the "experts" setting, since the models can be viewed as "experts" providing "advice" to the algorithm. Variants and extensions of this experts setting have been extensively studied by Littlestone and Warmuth [10], Vovk [12], CesaBianchi et al.[2], [3], Haussler et al. [6], and others in the area of computational learning theory. Here we use a Bayesian approach to derive prediction algorithms with good performance in the experts settings. A crucial aspect of this work is that although the algorithms are derived by making probabilistic assumptions about the generation of the sequence to be predicted, they are analyzed in the adversarial experts setting.In this experts setting, a "master algorithm" attempts to predict, one by one, the bits of an unknown sequence. Before predicting each bit, the master is allowed to listen to the "advice" provided by a pool of N experts. After each bit is revealed, the master