Although energy efficiency is a hot topic in the context of global climate change, in the European Union directives and in national energy policies, methodology for estimating energy efficiency still relies on standard techniques defined by experts in the field. Recent research shows a potential of machine learning methods that can produce models to assess energy efficiency based on available previous data. In this paper, we analyse a real dataset of public buildings in Croatia, extract their most important features based on the correlation analysis and chi-square tests, cluster the buildings based on three selected features, and create a prediction model of energy efficiency for each cluster of buildings using the artificial neural network (ANN) methodology. The main objective of this research was to investigate whether a clustering procedure improves the accuracy of a neural network prediction model or not. For that purpose, the symmetric mean average percentage error (SMAPE) was used to compare the accuracy of the initial prediction model obtained on the whole dataset and the separate models obtained on each cluster. The results show that the clustering procedure has not increased the prediction accuracy of the models. Those preliminary findings can be used to set goals for future research, which can be focused on estimating clusters using more features, conducted more extensive variable reduction, and testing more machine learning algorithms to obtain more accurate models which will enable reducing costs in the public sector.
Abstract. The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre-processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non-parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.