In this study we use machine learning to perform explainable business sector prediction from financial statements. Financial statements are a valuable source of information on the financial state and performance of firms. Recently, large-scale data on financial statements has become available in the form of open data sets. Previous work on such data mainly focused on predicting fraud and bankruptcy. In this paper we devise a model for business sector prediction, which has several valuable applications, including automated error and fraud detection. In addition, such a predictive model may help in completing similar datasets with missing sector information. The proposed method employs a supervised learning approach based on random forests that addresses business sector prediction as a classification task. Using a dataset from the Netherlands Chamber of Commerce, containing over 1.5 million financial statements from Dutch companies, we created an adequately-performing model for business sector prediction. By assessing which features are instrumental in the final classification model, we found that a small number of attributes is crucial for predicting the majority of business sectors. Interestingly, in some cases the presence or absence of a feature was more important than the value itself. The resulting insights may also prove useful in accounting, where the relation between financial statements and characteristics of the company is a frequently studied topic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.