Data-driven black-box surrogate models are widely used in research related to
buildings energy efficiency. They are based on machine learning techniques,
learn from available data, and act as a replacement for or an addition to
complex and computationally intensive knowledge-based models. Surrogate
models can predict energy demand, indoor air temperature, or occupants
behavior; explore search space in optimization problems; learn control
rules; etc. This paper analyzes surrogate models that classify building
retrofit measures directly according to the global cost. In addition, they
quantify the importance of each variable for the classification process. The
models are based on random forest classifiers, which are fast and powerful
ensemble learners. They can be applied to effectively reduce search spaces
when optimizing energy renovation measures or to rapidly identify projects
that deserve financial support. This approach is applied to two residential
buildings and three scenarios of price development. The training process
uses a small share of retrofit options assessed with standard calculations
of the heating and cooling demands, as well as the global cost. The results
show very high classification performance, even when the models are trained
with small and imbalanced training sets. The obtained recall, precision, and
F-score values are mostly above 95%, except for extremely small training
sets.