Large Scale predictive analytics for real-time energy management

Balac, Natasha; Sipes, T.; Wolter, Nicole; Nunes, Kenneth; Sinkovits, Bob; Karimabadi, H.

doi:10.1109/bigdata.2013.6691635

Cited by 27 publications

(21 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With multidimensional range queries, analysts define the subspaces in R d of interest within the overall data space. High quality cardinality prediction in such subspaces then becomes important for data mining, data exploration, time series analysis, and big data visualization tasks [9,12] of data (sub)spaces of interest.…”

Section: Definition 1 (Range Querymentioning

confidence: 99%

“…Frequently, data analysts, data scientists, and statisticians are in search of approximate answers to such queries over unknown data subspaces, which supports knowledge discovery and underlying data function estimation. Imagine exploratory and predictive analytics [9] based on a stream of such aggregation operators over data subspaces being issued, until the scientists/analysts extract sufficient statistics or fit local function estimators, e.g., coefficient of determination, product-moment correlation coefficient, and multivariate local linear approximation over the subspaces of interest.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Scalable aggregation predictive analytics

2017

View full text Add to dashboard Cite

We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries' answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restrictedaccess data, (ii) offers incremental learning adjusted for Christos Anagnostopoulos christos.anagnostopoulos@glasgow.ac.uk Peter Triantafillou peter.triantafillou@glasgow.ac.uk 1 School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK arriving ad-hoc queries, which is well suited for querydriven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark's COUNT method.

show abstract

Section: Definition 1 (Range Querymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Scalable aggregation predictive analytics

2017

View full text Add to dashboard Cite

show abstract

“…With m-d range queries, analysts define the subspaces of interest within the overall data space. SCP in such subspaces then becomes important for data mining, query-driven data exploration, time series analysis, and big data visualization tasks [8], [4] of data (sub)spaces of interest. In exploratory and predictive analytics, a data scientist routinely defines specific regions of a large dataset that are worth exploring and wishes to derive and predict statistics over the populations of these regions -which amounts to SCP of the corresponding range queries.…”

Section: Introductionmentioning

confidence: 99%

“…Frequently, data analysts and statisticians are in search of (approximate and/or partial) answers to such queries over unknown data subspaces (knowledge discovery). Imagine exploratory and predictive analytics [4] based on a stream of such aggregation operators over data subspaces being issued, until the scientist extracts sufficient statistics or learns local statistical characteristics, e.g., coefficient of determination and product-moment correlation coefficient, of the subspaces of interest.…”

Section: Introductionmentioning

confidence: 99%

Learning to accurately COUNT with query-driven predictive analytics

Anagnostopoulos

Triantafillou

2015

2015 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Abstract-We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale data. The proposed solution is generally applicable, in the sense that it can be deployed in environments in which data owners may or may not restrict access to their data and allow only 'aggregation operators' to be executed over their data. For this, it is based on predictive analytics, driven by queries and their results. We propose a machine learning (ML) framework for the task (which can be adapted for different aggregates as well). We focus on the widely used set-cardinality (i.e., COUNT) aggregation operator, as it is a fundamental operator for both internal data system optimisations and for aggregation-query analytics. We contribute a novel, query-driven ML model whose goals are to: (i) learn the query space (access patterns), (ii) associate (complex) aggregation queries with the cardinality of their results, (iii) define query similarity and use it to predict the cardinality of the answer set of an ad-hoc incoming query. Our ML model incorporates incremental learning algorithms for ensuring high prediction accuracy even when both the querying patterns and the underlying data change. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general environments which include restrictedaccess data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for big data analytics, and (iii) offers a performance (in terms of prediction accuracy and time, and memory requirements) that is superior to datacentric approaches. We provide a comprehensive performance evaluation of our model, evaluating its sensitivity and comparative advantages versus acclaimed data-centric methods (self-tuning histograms, sampling, and multidimensional histograms).

show abstract

“…However, these rule-inference methods do not guarantee optimality. Other contributions exploit data mining approaches to predict the building energy performance: Fan et al [21] use ensemble models for predicting next-day energy consumption and peak power, whereas Balac et al [22] develop a highly scalable framework capable of analysing and predicting the building behaviour considering alternative energy sources and smart grid constraints in real-time.…”

Section: Related Workmentioning

confidence: 99%

Energy Optimization and Management of Demand Response Interactions in a Smart Campus

et al. 2016

View full text Add to dashboard Cite

Abstract:The proposed framework enables innovative power management in smart campuses, integrating local renewable energy sources, battery banks and controllable loads and supporting Demand Response interactions with the electricity grid operators. The paper describes each system component: the Energy Management System responsible for power usage scheduling, the telecommunication infrastructure in charge of data exchanging and the integrated data repository devoted to information storage. We also discuss the relevant use cases and validate the framework in a few deployed demonstrators.

show abstract

Large Scale predictive analytics for real-time energy management

Cited by 27 publications

References 7 publications

Scalable aggregation predictive analytics

Scalable aggregation predictive analytics

Learning to accurately COUNT with query-driven predictive analytics

Energy Optimization and Management of Demand Response Interactions in a Smart Campus

Contact Info

Product

Resources

About