Classification learning is a very important issue in machine learning, which has been widely used in the field of financial distress warning. Some researches show that the prediction model framework based on sparse algorithm has better performance than the traditional model. In this paper, we explore the financial distress prediction based on grouping sparsity. Feature selection of sparse algorithm plays an important role in classification learning, because many redundant and irrelevant features will degrade performance. A good feature selection algorithm would reduce computational complexity and improve classification accuracy. In this study, we propose an algorithm for feature selection classification prediction based on feature attributes and data source grouping. The existing financial distress prediction model usually only uses the data from financial statement and ignores the timeliness of company sample in practice. Therefore, we propose a corporate financial distress prediction model that is better in line with the practice and combines the grouping sparse principal component analysis of financial data, corporate governance characteristics, and market transaction data with support vector machine. Experimental results show that this method can improve the prediction efficiency of financial distress with fewer characteristic variables.
The prediction of audit opinions of listed companies plays a significant role in the security market risk prevention. By introducing machine learning methods, many innovations can be implemented to improve audit quality, lift audit efficiency, and cultivate the keen insight of auditors. However, in a realistic environment, category imbalance and critical feature selection exist in the prediction model of company audit opinions. This paper firstly combines batched sparse principal component analysis (BSPCA) with kernel fuzzy clustering algorithm (KFCM) and proposes a sparse-kernel fuzzy clustering undersampling method (S-KFCM) to deal with the imbalance of sample categories. This method adopts the kernel fuzzy clustering algorithm to down-sample the normal samples, and their features are extracted from abnormal sample sets based on the group sparse component method. The sparse normal sample set can maintain the original distribution space structure and highlight the classification boundary samples. Secondly, considering the company’s characteristic attributes and data sources, 448 original variables are grouped, and then BSPCA is used for feature screening. Finally, the support vector machine (SVM) is adopted to complete the classification prediction. According to the empirical results, the SKFCM-SVM model has the highest prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.