This paper presents a detailed discussion of problem formulation and data representation issues in the design, deployment, and operation of a massive-scale machine learning system for targeted display advertising. Notably, the machine learning system itself is deployed and has been in continual use for years, for thousands of advertising campaigns (in contrast to simply having the models from the system be deployed). In this application, acquiring sufficient data for training from the ideal sampling distribution is prohibitively expensive. Instead, data are drawn from surrogate domains and learning tasks, and then transferred to the target task. We present the design of this multistage transfer learning system, highlighting the problem formulation aspects. We then present a detailed experimental evaluation, showing that the different transfer stages indeed each add value. We next present production results across a variety of advertising clients from a variety of industries, illustrating the performance of the system in use. We close the paper with a collection of lessons learned from the work over half a decade on this complex, deployed, and broadly used machine learning system.