Optimal learning addresses the problem of how to collect information so that it benefits future decisions. For off‐line problems, we have to make a series of measurements or observations before choosing a final design or set of parameters; for on‐line problems, we learn from rewards we are receiving, and we want to strike a balance between rewards earned now and better decisions in the future. This article reviews these problems, describes optimal and heuristic policies, and shows how to compare competing policies. Then, the presentation focuses on the concept of the knowledge gradient, which guides information collection by maximizing the marginal value of information. We show how this idea can be applied to both on‐line and off‐line problems, as well as a broad range of other applications which have not previously yielded to formal techniques.