This study aims to develop a better Financial Statement Fraud (FSF) detection model by utilizing data from publicly available financial statements of firms in the MENA region. We develop an FSF model using a powerful ensemble technique, the XGBoost (eXtreme Gradient Boosting) algorithm, that helps to identify fraud in a set of sample companies drawn from the Middle East and North Africa (MENA) region. The issue of class imbalance in the dataset is addressed by applying the Synthetic Minority Oversampling Technique (SMOTE) algorithm. We use different Machine Learning techniques in Python to predict FSF, and our empirical findings show that the XGBoost algorithm outperformed the other algorithms in this study, namely, Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), AdaBoost, and Random Forest (RF). We then optimize the XGBoost algorithm to obtain the best result, with a final accuracy of 96.05% in the detection of FSF.
Fraudulent financial statements are deliberate furnishing and/or reporting incorrect statistics, and this has become a major economic and social concern as the global market is witnessing an upsurge in financial accounting fraud, costing businesses billions of dollars a year. Identifying companies that manipulate financial statements remains a challenge for auditors, as fraud strategies have become increasingly sophisticated over the years. We evaluate machine learning techniques for financial statement fraud detection, particularly a powerful ensemble technique, the XGBoost algorithm, that help to identify fraud on a set of sample companies drawn from the MENA region. The issue of the class imbalance in the dataset is addressed by applying the SMOTE algorithm. We found that XGBoost algorithm outperformed other algorithms in this study: Logistic Regression (LR), Decision Tree (DT), Vector Machine Support (SVM), Adaboost, and RandomForest. The XGBoost algorithm is then optimised to obtain the optimum performance.
Despite the obvious benefits and growing popularity of Machine Learning (ML) technology, there are still concerns regarding its ability to provide Financial Distress Prediction (FDP). An accurate FDP model is required to avoid financial risk at the lowest possible cost. However, in the Internet era, financial data are exploding, and they are being coupled with other kinds of risk data, making an FDP model challenging to operate. As a result, researchers presented several novel FDP models based on ML and Deep Learning. Time series data is are important to reflect the multi-source and heterogeneous aspects of financial data. This paper gives insight into building a time-series model and forecasting distress far in advance of its occurrence. To build an efficient FDP model, we provide a hybrid model (GALSTM-FDP) that incorporates LSTM and GA. Unlike other previous studies, which established models that predicted distress probability only within one year, our approach predicts distress two years ahead. This research integrates GA with LSTM to find the optimum hyperparameter configuration for LSTM. Using GA, we focus on optimizing architectural aspects for modeling the optimal network based on prediction accuracy. The results showed that our algorithm outperforms other state-of-the-art methods in terms of predictive accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.