Companies have incentives to hide, omit, or falsify the information reported in financial statements (FS) (e.g., Balance Sheet, Income Statement, Cash Flow Statement) to give a false impression of the company's financial health, assure investors or evade taxes. Typically, misinformation is introduced by changing FS elements e.g overstating the assets/profits or understating the liabilities/losses. Once detected, misinformation can have disastrous consequences for employees, investors, banks and government. It is important to identify such companies and the nature and extent of misinformation in their FS. Auditors or forensic accountants use complex investigative methods to detect instances of misinformation in FS. The effort intensive and subjective nature of these methods limits their capacity to effectively identify misinformation. We propose two novel unsupervised model-based anomaly detection (AD) techniques based on regression and kernel density estimates. We show they perform better than 15 standard AD techniques and data envelopment analysis for detection of suspicious FS on a real-world dataset of 4100 listed companies. Our approach provides specific suggestions regarding where the misinformation may be present, which helps in increasing the effectiveness of investigations.
Financial audits establish trust in the governance and processes in an organization, but they are time-consuming and knowledge intensive. To increase the effectiveness of financial audit, we address the task of generating audit suggestions that can help auditors to focus their investigations. Specifically, we present NLP techniques to extract hidden knowledge from a corpus of past financial audit reports of many companies, and use it for generating audit suggestions. The extracted knowledge consists of a set of automatically identified sentences containing adverse remarks, the financial variables mentioned in each sentence and automatically assigned XBRL categories for them, since XBRL is a standardized taxonomy in the financial domain. In the absence of suitable labeled data, we adopted a weak supervision approach. We designed a set of high precision linguistic rules to identify adverse remark sentences, created automatically labeled training data using them, and trained BERT-based and other classifiers to identify such sentences. We next presented novel techniques (which are either unsupervised or zero-shot) to assign zero, one, or more XBRL categories to any given adverse remark sentence. We evaluated the proposed approaches, on a large corpus of real-life financial statements and audit reports, against competent baselines. Given a company’s financial statements (already identified as suspicious), and given a subset of financial variables in them that contribute to suspiciousness, we match these with the extracted knowledge base and identify aligned adverse remarks that help the auditor in focusing on specific directions for further investigations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.