Han Cai scite author profile

Abstract-Predicting user responses, such as clicks and conversions, is of great importance and has found its usage in many Web applications including recommender systems, web search and online advertising. The data in those applications is mostly categorical and contains multiple fields; a typical representation is to transform it into a high-dimensional sparse binary feature representation via one-hot encoding. Facing with the extreme sparsity, traditional models may limit their capacity of mining shallow patterns from the data, i.e. low-order feature combinations. Deep models like deep neural networks, on the other hand, cannot be directly applied for the high-dimensional input because of the huge feature space. In this paper, we propose a Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfield categories, and further fully connected layers to explore high-order feature interactions. Our experimental results on two large-scale real-world ad click datasets demonstrate that PNNs consistently outperform the state-of-the-art models on various metrics.

show abstract

Real-Time Bidding by Reinforcement Learning in Display Advertising

Cai

et al. 2017

View full text Add to dashboard Cite

The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks.Comment: WSDM 201

show abstract

Crossing Over the Bounded Domain: From Exponential to Power-Law Intermeeting Time in Mobile Ad Hoc Networks

Cai

Eun

2009

IEEE/ACM Trans. Networking

154

114

View full text Add to dashboard Cite

Abstract-Inter-meeting time between mobile nodes is one of the key metrics in a Mobile Ad-hoc Network (MANET) and central to the end-to-end delay of forwarding algorithms. It is typically assumed to be exponentially distributed in many performance studies of MANET or numerically shown to be exponentially distributed under most existing mobility models in the literature. However, recent empirical results show otherwise: the inter-meeting time distribution in fact follows a power-law. This outright discrepancy potentially undermines our understanding of the performance tradeoffs in MANET obtained under the exponential distribution of the inter-meeting time, and thus calls for further study on the power-law inter-meeting time including its fundamental cause, mobility modeling, and its effect. In this paper, we rigorously prove that a finite domain, on which most of the current mobility models are defined, plays an important role in creating the exponential tail of the inter-meeting time. We also prove that by simply removing the boundary in a simple two-dimensional isotropic random walk model, we are able to obtain the empirically observed power-law decay of the inter-meeting time. We then discuss the relationship between the size of the boundary and the relevant timescale of the network scenario under consideration. Our results thus provide guidelines on the mobility modeling with power-law inter-meeting time distribution, new protocols including packet forwarding algorithms, as well as their performance analysis.Index Terms-mobile ad-hoc network, inter-meeting time distribution, exponential vs. power-law, bounded domain, time and space scaling.

show abstract

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

et al. 2020

View full text Add to dashboard Cite

Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with arbitrary encoder-decoder attention and heterogeneous layers. Then we train a Super-Transformer that covers all candidates in the design space, and efficiently produces many SubTransformers with weight sharing. Finally, we perform an evolutionary search with a hardware latency constraint to find a specialized SubTransformer dedicated to run fast on the target hardware. Extensive experiments on four machine translation tasks demonstrate that HAT can discover efficient models for different hardware (CPU, GPU, IoT device). When running WMT'14 translation task on Raspberry Pi-4, HAT can achieve 3× speedup, 3.7× smaller size over baseline Transformer; 2.7× speedup, 3.6× smaller size over Evolved Transformer with 12,041× less search cost and no performance loss. HAT is open-sourced.

show abstract

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

et al. 2020

View full text Add to dashboard Cite

We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized model, accuracy pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a fullprecision (i.e., fp32) accuracy predictor to the quantizationaware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all [3] network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2×/1.3× over MobileNetV2+HAQ [30,36]. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ [5,12,36]), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO 2 emission, pushing the frontier for green AI that is environmentalfriendly. The code and video are publicly available.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Han Cai

Product-Based Neural Networks for User Response Prediction

Real-Time Bidding by Reinforcement Learning in Display Advertising

Crossing Over the Bounded Domain: From Exponential to Power-Law Intermeeting Time in Mobile Ad Hoc Networks

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Contact Info

Product

Resources

About