Datasets are important for researchers to build models and test how these perform, as well as to reproduce research experiments from others. This data paper presents the NILM Performance Evaluation dataset (NILMPEds), which is aimed primarily at research reproducibility in the field of Non-intrusive load monitoring. This initial release of NILMPEds is dedicated to event detection algorithms and is comprised of ground-truth data for four test datasets, the specification of 47,950 event detection models, the power events returned by each model in the four test datasets, and the performance of each individual model according to 31 performance metrics.
SummaryPublic datasets are crucial elements in data science research as these not only allow researchers to perform systematic evaluations and benchmarks of their algorithms but also enable other researchers to replicate and reproduce existing research [1].Non-Intrusive Load Monitoring (NILM or load disaggregation) is the process of estimating the energy consumption of individual appliances from electric power measurements taken at a limited number of locations in the electrical distribution of a building [2]. A typical NILM dataset is a collection of electrical energy measurements, taken from the mains (i.e., aggregate consumption) and from the individual loads (i.e., ground-truth data, which are obtained either by measuring each load at the plug-level or measuring the circuit to which the load is connected [3].As presented in a recent review [3], there are over 20 public datasets for NILM research. According to the same review, these datasets can be categorized according to their suitability to be used to evaluate event-based and event-less approaches [4]. Event-based strategies seek to disaggregate the total consumption employing detecting and labeling appliance transition (referred to as power events) in the aggregated signal. As such, datasets for event-based NILM must also include labels for the power events from the appliances of interest. On the other hand, event-less approaches attempt to match each sample of the aggregated power to the consumption of one specific device, or a combination of different devices. Therefore, datasets for event-less approaches do not require any labeled transitions.Existing NILM datasets are also categorized according to the data reporting rates [5]: macroscopic datasets with data reporting rates around 1 Hz, and microscopic datasets with rates of several kHz.