Abstract:The widespread deployment of smart meters that frequently report energy consumption information, is a known threat to consumers' privacy. Many promising privacy protection mechanisms based on secure aggregation schemes have been proposed. Even though these schemes are cryptographically secure, the energy provider has access to the plaintext aggregated power consumption. A privacy trade-off exists between the size of the aggregation scheme and the personal data that might be leaked, where smaller aggregation sizes leak more personal data. Recently, a UK industrial body has studied this privacy trade-off and identified that two smart meters forming an aggregate, are sufficient to achieve privacy. In this work, we challenge this study and investigate which aggregation sizes are sufficient to achieve privacy in the smart grid. Therefore, we propose a flexible, yet formal privacy metric using a cryptographic game based definition. Studying publiclyavailable, real world energy consumption datasets with various temporal resolutions, ranging from minutes to hourly intervals, we show that a typical household can be identified with very high probability. For example, we observe a 50% advantage over random guessing in identifying households for an aggregation size of 20 households with a 15-minutes reporting interval. Furthermore, our results indicate that single appliances can be identified with significant probability in aggregation sizes up to 10 households.
As email workloads keep rising, email servers need to handle this explosive growth while offering good quality of service to users. In this work, we focus on modeling the workload of the email servers of four universities (2 from Greece, 1 from the UK, 1 from Australia). We model all types of email traffic, including user and system emails, as well as spam. We initially tested some of the most popular distributions for workload characterization and used statistical tests to evaluate our findings. The significant differences in the prediction accuracy results for the four datasets led us to investigate the use of a Recurrent Neural Network (RNN) as time series modeling to model the server workload, which is a first for such a problem. Our results show that the use of RNN modeling leads in most cases to high modeling accuracy for all four campus email traffic datasets.
Analysis of time series data has been a challenging research subject for decades. Email traffic has recently been modelled as a time series function using a Recurrent Neural Network (RNN) and RNNs were shown to provide higher prediction accuracy than previous probabilistic models from the literature. Given the exponential rise of email workloads which need to be handled by email servers, in this paper we first present and discuss the literature on modelling email traffic. We then explain the advantages and limitations of different approaches as well as their points of agreement and disagreement. Finally, we present a comprehensive comparison between the performance of RNN and Long Short Term Memory (LSTM) models. Our experimental results demonstrate that both approaches can achieve high accuracy over four large datasets acquired from different universities' servers, outperforming existing work, and show that the use of LSTM and RNN is very promising for modelling email traffic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.