Resource usage estimation for managing streaming workload in emerging applications domains such as enterprise computing, smart cities, remote healthcare, and astronomy, has emerged as a challenging research problem. Such resource estimation for processing continuous queries over streaming data is challenging due to: (i) uncertain stream arrival patterns, (ii) need to process different mixes of queries, and (iii) varying resource consumption. Existing techniques approximate resource usage for a query as a single point value which may not be sufficient because it is neither expressive enough nor does it capture the aforementioned nature of streaming workload. In this paper, we present a novel approach of using mixture density networks to estimate the whole spectrum of resource usage as probability density functions. We have evaluated our technique using the linear road benchmark and TPC-H in both private and public clouds. The efficiency and applicability of the proposed approach is demonstrated via two novel applications: i) predictable auto-scaling policy setting which highlights the potential of distribution prediction in consistent definition of cloud elasticity rules; and ii) a distribution based admission controller which is able to efficiently admit or reject incoming queries based on probabilistic SLAs compliance goals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.