As online shopping becomes ever more prevalent, customers rely increasingly on product rating websites for making purchase decisions. The reliability of online ratings, however, is potentially compromised by the so-called herding effect: when rating a product, customers may be biased to follow other customers' previous ratings of the same product. This is problematic because it skews long-term customer perception through haphazard early ratings. The study of herding poses methodological challenges. In particular, observational studies are impeded by the lack of counterfactuals: simply correlating early with subsequent ratings is insufficient because we cannot know what the subsequent ratings would have looked like had the first ratings been different. The methodology introduced here exploits a setting that comes close to an experiment, although it is purely observational-a natural experiment. Our key methodological device consists in studying the same product on two separate rating sites, focusing on products that received a high first rating on one site, and a low first rating on the other. This largely controls for confounds such as a product's inherent quality, advertising, and producer identity, and lets us isolate the effect of the first rating on subsequent ratings. In a case study, we focus on beers as products and jointly study two beer rating sites, but our method applies to any pair of sites across which products can be matched. We find clear evidence of herding in beer ratings. For instance, if a beer receives a very high first rating, its second rating is on average half a standard deviation higher, compared to a situation where the identical beer receives a very low first rating. Moreover, herding effects tend to last a long time and are noticeable even after 20 or more ratings. Our results have important implications for the design of better rating systems.