Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.
Recently there has been an increasing interest in building distributed platforms for processing of fast data streams. In this demonstration, we highlight the need for elasticity in distributed data stream processing systems and present Enorm, a data stream processing platform with focus on elasticity, i.e. the ability to dynamically scale resource usage according to the runtime workload fluctuations. In order to achieve dynamic scaling with minimal overhead and latency, we use an integrated approach for both fault-tolerance and elasticity. The idea is that both fault-tolerance and elasticity essentially require replicating or migrating computation states among different nodes. Integrating and sharing the state management operations between the two modules can not only provide abundant opportunities to reduce the system's runtime overhead but also simplify the system's architecture.
No abstract
We study the problem of optimal bid selection across ads and time, with the aim to maximize incoming click traffic to the advertiser's landing page, which is directly translated in maximizing revenue. A major novelty of our approach lies in using Machine Learning (ML) to build regression models out of available data for deriving for each ad the relations, (i) cost-per-click (CPC) charged by the platform versus bid, (ii) assigned ad position in the ad list versus bid value, and (iii) number of ad clicks versus its position. These regression models naturally reveal hidden trends that would have been otherwise unavailable to the advertiser, such as the bidding behavior of competing advertisers and quality scores of their ads. We then incorporate these relations into a convex optimization problem of budget allocation across ads and across time, the solution of which is the optimal bidding strategy of the advertiser. We validate our approach with real data provided by an online advertising company that is active in the banking sector. Our solution leads to substantial increase in the amount of inbound click traffic to the advertiser's landing page compared to other approaches that are either heuristic and data-agnostic or employ simple statistics on data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.