Detecting financial fraud to profile crimes and pinpoint system vulnerabilities is an essential issue in the financial industry. Because of interpretability requirements and the lack of mass transaction data due to privacy regulations, sophisticated handcrafted features have been adopted in much of the literature for fraud detection. In addition to established recency, frequency, monetary, and anomaly features, we propose behavior-and segmentation-type features based on statistical characteristics belonging solely to (non-)fraudulent accounts informed by financial expertise. Our proposed features are difficult for automatic feature generators to synthesize, and provide transparent cause-effect relationships and good prediction results. Features with time-inhomogeneous properties cause popular boosting classifiers such as XGBoost and LGBM to produce unstable detection results. We use the Kolmogorov-Smirnov test to detect and remove these features to improve XGBoost and LGBM detection performance and robustness. The resulting performance shown in our experiments is better than that of other classifiers, such as SVM and random forests. We examine the advantage of our technique by comparing it with several feature engineering works on fraud detection and automatic feature generation methods. On the other hand, we also find that generating training/testing sets with random sampling falsely eliminates such time inhomogeneity and results in misleading assessments of the robustness of machine learning models. These time-inhomogeneous phenomena also entail various modus operandi patterns, which influence the performance of different resampling methods for addressing data imbalance in fraud detection. Improper linear interpolation of SMOTE-related approaches leads to poor performance due to varying patterns of modi operandi. However, synthesizing fraudulent samples with simple oversampling and GANs mitigates this problem.
Modern money transfer services are convenient, attracting fraudulent actors to run scams in which victims are deceived into transferring funds to fraudulent accounts. Machine learning models are broadly applied due to the poor fraud detection performance of traditional rule-based approaches. Learning directly from raw transaction data is impractical due to its high-dimensional nature; most studies construct features instead by extracting patterns from raw transaction data. Past literature categorizes these features into recency, frequency, monetary, and anomaly detection features. We use various machine learning algorithms to examine the performance of features in these four categories with real transaction data; we compare them with the performance of our feature generation guideline based on the statistical perspectives and characteristics of (non)-fraudulent accounts. The results show that except for the monetary category, other feature categories used in the literature perform poorly regardless of which machine learning algorithm is used; anomaly detection features perform the worst. We find that even statistical features generated based on financial knowledge yield limited performance on a real transaction dataset. Our atypical detection characteristic of normal accounts improves the ability to distinguish them from fraudulent accounts and hence improves the overall detection results, outperforming other existent methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.