Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review

Ashtiani, Matin N.; Raahemi, Bijan

doi:10.1109/access.2021.3096799

Cited by 71 publications

(37 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…support vector machine, decision tree and random forest) and regression algorithms (e. g. linear regression and logistic regression). The unsupervised learning approach analyzes unlabeled data sets and includes methods such as clustering and association (see, for example, Ashtiani and Raahemi, 2021). According to these explanations, it can be argued that the keywords of the first context are related to the title of "fraud detection techniques" for cluster one.…”

Section: Topic Modeling Approachmentioning

confidence: 99%

Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Soltani

Kythreotis

Roshanpoor

2023

JFC

View full text Add to dashboard Cite

Purpose The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into financial statement fraud (FSF) literature analysis. This framework facilitates the analysis of a large amount of literature to show the trend of the field and identify the most productive authors, journals and potential areas for future research. Design/methodology/approach In this study, a framework was introduced that merges bibliometric analysis techniques such as word frequency, co-word analysis and coauthorship analysis with the Latent Dirichlet Allocation topic modeling approach. This framework was used to uncover subtopics from 20 years of financial fraud research articles. Furthermore, the hierarchical clustering method was used on selected subtopics to demonstrate the primary contexts in the literature on FSF. Findings This study has contributed to the literature in two ways. First, this study has determined the top journals, articles, countries and keywords based on various bibliometric metrics. Second, using topic modeling and then hierarchy clustering, this study demonstrates the four primary contexts in FSF detection. Research limitations/implications In this study, the authors tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. This study has focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, the authors tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, this study suggests that the researchers consider other measures to check the citation quality and assess the studies’ precision by applying meta-analysis. Originality/value Despite the popularity of bibliometric analysis and topic modeling, there have been limited efforts to use machine learning for literature review. This novel approach of using hierarchical clustering on topic modeling results enable us to uncover four primary contexts. Furthermore, this method allowed us to show the keywords of each context and highlight significant articles within each context.

show abstract

Section: Topic Modeling Approachmentioning

confidence: 99%

Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Soltani

Kythreotis

Roshanpoor

2023

JFC

View full text Add to dashboard Cite

show abstract

“…ensemble.RandomForestClassifier.html, November 2022. 21 Available at https://keras.io/api/models/sequential, November 2022. 22 Available at https://keras.io/api/layers/recurrent_layers/lstm, November 2022.…”

Section: A Experimental Data-setmentioning

confidence: 99%

Explainable Automatic Industrial Carbon Footprint Estimation From Bank Transaction Classification Using Natural Language Processing

et al. 2022

View full text Add to dashboard Cite

Concerns about the effect of greenhouse gases have motivated the development of certification protocols to quantify the industrial carbon footprint (CF). These protocols are manual, workintensive, and expensive. All of the above have led to a shift towards automatic data-driven approaches to estimate the CF, including Machine Learning (ML) solutions. Unfortunately, as in other sectors of interest, the decision-making processes involved in these solutions lack transparency from the end user's point of view, who must blindly trust their outcomes compared to intelligible traditional manual approaches. In this research, manual and automatic methodologies for CF estimation were reviewed, taking into account their transparency limitations. This analysis led to the proposal of a new explainable ML solution for automatic CF calculations through bank transaction classification. Consideration should be given to the fact that no previous research has considered the explainability of bank transaction classification for this purpose. For classification, different ML models have been employed based on their promising performance in similar problems in the literature, such as Support Vector Machine, Random Forest, and Recursive Neural Networks. The results obtained were in the 90 % range for accuracy, precision, and recall evaluation metrics. From their decision paths, the proposed solution estimates the CO 2 emissions associated with bank transactions. The explainability methodology is based on an agnostic evaluation of the influence of the input terms extracted from the descriptions of transactions using locally interpretable models. The explainability terms were automatically validated using a similarity metric over the descriptions of the target categories. Conclusively, the explanation performance is satisfactory in terms of the proximity of the explanations to the associated activity sector descriptions, endorsing the trustworthiness of the process for a human operator and end users.

show abstract

“…A variety of methods, including DM, decision trees, rule depend mining, neural networks, clustering of fuzzy, and ML, will be used by banks and credit card firms during COVID-19 in an effort to catch fraudsters red-handed. Based on previous activity, the technique attempts to determine a customer's regular usage pattern ( Ashtiani and Raahemi, 2021 , Khan et al, 2021 , Adday et al, 2021 ). The purpose of this research is to suggest a mechanism for detecting such fraud transactions in such an uncontrolled pandemic situation.…”

Section: Introductionmentioning

confidence: 99%

Advancement of management information system for discovering fraud in master card based intelligent supervised machine learning and deep learning during SARS-CoV2

Al-Ghamdi

et al. 2023

Information Processing & Management

View full text Add to dashboard Cite

Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review

Cited by 71 publications

References 69 publications

Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Explainable Automatic Industrial Carbon Footprint Estimation From Bank Transaction Classification Using Natural Language Processing

Advancement of management information system for discovering fraud in master card based intelligent supervised machine learning and deep learning during SARS-CoV2

Contact Info

Product

Resources

About