This paper presents a detailed discussion of problem formulation and data representation issues in the design, deployment, and operation of a massive-scale machine learning system for targeted display advertising. Notably, the machine learning system itself is deployed and has been in continual use for years, for thousands of advertising campaigns (in contrast to simply having the models from the system be deployed). In this application, acquiring sufficient data for training from the ideal sampling distribution is prohibitively expensive. Instead, data are drawn from surrogate domains and learning tasks, and then transferred to the target task. We present the design of this multistage transfer learning system, highlighting the problem formulation aspects. We then present a detailed experimental evaluation, showing that the different transfer stages indeed each add value. We next present production results across a variety of advertising clients from a variety of industries, illustrating the performance of the system in use. We close the paper with a collection of lessons learned from the work over half a decade on this complex, deployed, and broadly used machine learning system.
ProtoMol is a high-performance framework in C++ for rapid prototyping of novel algorithms for molecular dynamics and related applications. Its flexibility is achieved primarily through the use of inheritance and design patterns (object-oriented programming). Performance is obtained by using templates that enable generation of efficient code for sections critical to performance (generic programming). The framework encapsulates important optimizations that can be used by developers, such as parallelism in the force computation. Its design is based on domain analysis of numerical integrators for molecular dynamics (MD) and of fast solvers for the force computation, particularly due to electrostatic interactions. Several new and efficient algorithms are implemented in ProtoMol. Finally, it is shown that ProtoMol's sequential performance is excellent when compared to a leading MD program, and that it scales well for moderate number of processors. Binaries and source codes for Windows, Linux, Solaris, IRIX, HP-UX, and AIX platforms are available under open source license at http://protomol.sourceforge.net.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.