Machine Learning Algorithms (ML) offer a high potential with low manual effort to discover appropriate energy efficiency measures for buildings. Although many building automation systems (BAS) record a high amount of data, technical systems such as boilers provide only a few data points per building. However, machine-learning algorithms require training based on a sufficient number of instances of a technical system in order to enable cross-building use. In contrast to electrical systems, few data sets of actual operation of thermal systems are publicly available. Since 2012, the monitoring system in our test object has continuously provided threshold-based data with a maximum resolution of 1 minute. We monitor the plants, energy consumption and comfort parameters with 9239 data points in total. In this paper, we show how our published data set from this building is structured. In order to facilitate the use of ML, each data point receives a uniform label according to a previously developed approach. Since the documentation of ML data sets varies in the building sector, we show an approach to standardize data sets with special datasheets for thermal systems to provide sufficient information for application of ML. We use the Brick Schema, a unified ontology standard for the description of topology in buildings, which is part of the future ASHRAE Standard 223P. We couple this with an approach we developed for the structured labeling of data points in buildings. We show how to semi-automatically generate physical models based on an open-source Modelica library from this ontology-based model. We show that the models, enriched with real time series data and data sheets, are in good agreement with the measured data. Finally, we show with an ML example that our approach based on Brick Schema and Modelica is able to deliver ML compliant data sets.