Learning models extend the traditional discrete choice framework by postulating that consumers have incomplete information about product attributes, and that they learn about these attributes over time. In this survey we describe the literature on learning models that has developed over the past 20 years, using the model of Erdem and Keane (1996) as a unifying framework. We described how subsequent work has extended their modeling framework, and applied learning models to a wide range of different products and markets. We argue that learning models have contributed greatly to our understanding of consumer behavior, in particular in enhancing our understanding of brand loyalty and long run advertising effects. We also discuss the limitations of existing learning models and discuss potential extensions. One key challenge is to disentangle learning as a source of dynamics from other key mechanisms that may generate choice dynamics (inventories, habit persistence, etc.). Another is to enhance identification of learning models by collecting and utilizing direct measures of signals, perceptions and expectations. 1 Until fairly recently, the logit was much more popular than probit, largely due to computational advantages. But advances in simulation methods, such as the GHK algorithm and Gibbs sampling (see Geweke and Keane (2001), McCulloch, Polson and Rossi (2000)) have greatly increased the popularity of probit, particularly among Bayesians. 2 Thus, learning models play havoc with traditional welfare analysis, as consumer surplus is no longer the area under the demand curve. Parameters of the demand curve are no longer structural parameters of preferences, but depend on the information set (e.g., they can be shifted by advertising). See Erdem, Keane and Sun (2008) for a discussion. 2 exogenously, or "active search," exerting effort to gather information. Or they may do both.A fourth distinction is how consumers learn. They may be Bayesians, or they may update perceptions in some other way. For instance, consumers may over/under weight new information relative to an optimal Bayesian rule, or forget information that was received too far in the past.Learning models were first applied to marketing problems in pioneering work by Roberts and Urban (1988) and Eckstein, Horsky and Raban (1988). 3 But, due to technical limitations of the time (both in computer speed and estimation algorithms), their models had to be quite simple. Roberts and Urban (1988) study how Bayesian consumers learn about a new product from wordof-mouth signals. Consumers in their model are risk averse, but myopic, so there is no active search. In contrast, in Eckstein et al (1988) consumers are forward-looking, and trial purchase is the (only) source of information. But utility is linear, so their model exhibits the "value of information" phenomenon, but not the "brand equity" phenomenon created by risk aversion. For Roberts and Urban (1988) the converse is true. The strong simplifying assumptions of these early models, plus the difficulty of estimating...