Due to the theoretical work of Hill Benford digital profile testing is now a staple in screening data for forensic investigations and audit examinations. Prior empirical literature indicates that Benford testing when applied to a large Benford Conforming Dataset often produces a bias called the FPE Screening Signal [FPESS] that misleads investigators into believing that the dataset is Non-Conforming in nature. Interestingly, the same FPESS can also be observed when investigators partition large datasets into smaller datasets to address a variety of auditing questions. In this study, we fill the empirical gap in the literature by investigating the sensitivity of the FPESS to partitioned datasets. We randomly selected 16 balance-sheet datasets from: China Stock Market Financial Statements Database™, that tested to be Benford Conforming noted as RBCD. We then explore how partitioning these datasets affects the FPESS by repeated randomly sampling: first 10% of the RBCD and then selecting 250 observations from the RBCD. This created two partitioned groups of 160 datasets each. The Statistical profile observed was: For the RBCD there were no indications of Non-Conformity; for the 10%-Sample there were no overall indications that Extended Procedures would be warranted; and for the 250-Sample there were a number of indications that the dataset was Non-Conforming. This demonstrated clearly that small datasets are indeed likely to create the FPESS. We offer a discussion of these results with implications for audits in the Big-Data context where the audit In-charge would find it necessary to partition the datasets of the client.
Bao, Lee, Heilig, and Lusk (2018) have documented and illustrated the Small Sample Size bias in Benford Screening of datasets for Non-Conformity. However, their sampling plan tested only a few random sample-bundles from a core set of data that were clearly Conforming to the Benford first digit profile. We extended their study using the same core datasets and DSS, called the Newcomb Benford Decision Support Systems Profiler [NBDSSP], to create an expanded set of random samples from their core sample. Specifically, we took repeated random samples in blocks of 10 down to 5% from their core-set of data in increments of 5% and finished with a random sample of 1%, 0.5% & 20 thus creating 221 sample-bundles. This arm focuses on the False Positive Signaling Error [FPSE]-i.e., believing that the sampled dataset is Non-Conforming when it, in fact, comes from a Conforming set of data. The second arm used the Hill Lottery dataset, argued and tested as Non-Conforming; we will use the same iteration model noted above to create a test of the False Negative Signaling Error [FNSE]-i.e., if for the sampled datasets the NBDSSP fails to detect Non-Conformity-to wit believing incorrectly that the dataset is Conforming. We find that there is a dramatic point in the sliding sampling scale at about 120 sampled points where the FPSE first appears-i.e., where the state of nature: Conforming incorrectly is flagged as Non-Conforming. Further, we find it is very unlikely that the FNSE manifests itself for the Hill dataset. This demonstrated clearly that small datasets are indeed likely to create the FPSE, and there should be little concern that Hill-type of datasets will not be indicated as Non-Conforming. We offer a discussion of these results with implications for audits in the Big-Data context where the audit In-charge may find it necessary to partition the datasets of the client.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.