In online advertising market it is crucial to provide advertisers with a reliable measurement of advertising effectiveness to make better marketing campaign planning. The basic idea for ad effectiveness measurement is to compare the performance (e.g., success rate) among users who were and who were not exposed to a certain treatment of ads. When a randomized experiment is not available, a naive comparison can be biased because exposed and unexposed populations typically have different features. One solid methodology for a fair comparison is to apply inverse propensity weighting with doubly robust estimation to the observational data. However the existing methods were not designed for the online advertising campaign, which usually suffers from huge volume of users, high dimensionality, high sparsity and imbalance. We propose an efficient framework to address these challenges in a real campaign circumstance. We utilize gradient boosting stumps for feature selection and gradient boosting trees for model fitting, and propose a subsampling-and-backscaling procedure that enables analysis on extremely sparse conversion data. The choice of features, models and feature selection scheme are validated with irrelevant conversion test. We further propose a parallel computing strategy, combined with the subsampling-and-backscaling procedure to reach computational efficiency. Our framework is applied to an online campaign involving millions of unique users, which shows substantially better model fitting and efficiency. Our framework can be further generalized to comparison of multiple treatments and more general treatment regimes, as sketched in the paper. Our framework is not limited to online advertising, but also applicable to other circumstances (e.g., social science) where a 'fair' comparison is needed with observational data.
We consider the problem of estimating occurrence rates of rare events for extremely sparse data using pre-existing hierarchies and selected features to perform inference along multiple dimensions. In particular, we focus on the problem of estimating click rates for {Advertiser, Publisher, and User} tuples where both the Advertisers and the Publishers are organized as hierarchies that capture broad contextual information at different levels of granularities. Typically, the click rates are low, and the coverage of the hierarchies and dimensions is sparse. To overcome these difficulties, we decompose the joint prior of the three-dimensional click-through rate using tensor decomposition and propose a multidimensional hierarchical Bayesian framework (abbreviated as MadHab). We set up a specific framework of each dimension to model dimension-specific characteristics. More specifically, we consider the hierarchical beta process prior for the Advertiser dimension and for the Publisher dimension respectively and a feature-dependent mixture model for the User dimension. Besides the centralized implementation, we propose two distributed algorithms through MapReduce and Spark for inferences, which make the model highly scalable and suited for large scale data mining applications. We demonstrate that on a real world ads campaign platform, our framework can effectively discriminate extremely rare events in terms of their click propensity.
We thank Drs. Zhang and Lee, two leading researchers in the area of computational advertisement, for their positive comments, directions to related work, open areas, and insightful questions. Our comments are as follows.Interactions between factors Both discussants comment on the interactions between factors. In the ads application, we have three natural dimensions: publisher (p), advertiser (a), and user (u) with their own hierarchies, respectively. Each forthcoming impression can be related to the three dimensions of different hierarchies, and we propose to decompose the click-through-rate (CTR) q p,a,u of each impression into q p q a q u using tensor decomposition accompanied with shrinkage priors for each hierarchy. Drs. Zhang and Lee both pointed to the paper by Agarwal et al. [1], which provides a framework that works well for modeling pairwise interactions in two-dimensional hierarchies. However, as also noted by Agarwal et al. [1], the direct extension to K-dimensional hierarchies does not perform well because of the increased sparsity. Dr. Lee also pointed to the direction using Gaussian process on tensor factorization. It is also a very interesting direction, and we believe some extra work needs to be developed for the parallel computation that can guarantee convergence in practice. Besides these, the current arising area deep learning [2] should also be a promising direction to handle high-order interactions in our framework.Model fitting Dr. Zhang also discussed several keys in successfully deploying our framework in practice through Weierstrass Sampler using Spark, especially when there are too few samples falling into some hierarchy combinations of the tuple {p, a, u}. Indeed, we should be extremely careful when deploying through MapReduce. However, we would like to make two points clearer here in the practical implementation: (i) Each CTR prediction is accompanied with a credible interval (CI) estimation. When CI is too wide, we will go one level up and use its 'parents' CTR prediction. (ii) Because our model fitting involves iterative computations over extremely large datasets, we consider asynchronous iterations for MapReduce similarly to [3]. Experiments using different data analysis applications over real-world and synthetic datasets show that using asynchronous iterations for MapReduce performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average.Final comments Again, we thank both the discussants and editors. We hope our paper sheds some lights in dealing with the very challenging CTR prediction problem in the practical deployment. We look forward to future developments in both methodology and applications in the field of the computational advertisement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.