Extensive research has been done on the analytical and empirical examination of financial data in annual reports to detect fraud; however, there is scant research on the analysis of text in annual reports to detect fraud. The basic premise of this research is that there are clues hidden in the text that can be detected to determine the likelihood of fraud. In this research, we examine both the verbal content and the presentation style of the qualitative portion of the annual reports using natural language processing tools and explore linguistic features that distinguish fraudulent annual reports from nonfraudulent annual reports. Our results indicate that employment of linguistic features is an effective means for detecting fraud. We were able to improve the prediction accuracy of our fraud detection model from initial baseline results of 56.75 percent accuracy, using a “bag of words” approach, to 89.51 percent accuracy when we incorporated linguistically motivated features inspired by our informed reasoning and domain knowledge.
SUMMARYWe present a novel approach for analysing the qualitative content of annual reports. Using natural language processing techniques we determine if sentiment expressed in the text matters in fraud detection. We focus on the Management Discussion and Analysis (MD&A) section of annual reports because of the nonfactual content present in this section, unlike other components of the annual reports. We measure the sentiment expressed in the text on the dimensions of polarity, subjectivity, and intensity and investigate in depth whether truthful and fraudulent MD&As differ in terms of sentiment polarity, sentiment subjectivity and sentiment intensity. Our results show that fraudulent MD&As on average contain three times more positive sentiment and four times more negative sentiment compared with truthful MD&As. This suggests that use of both positive and negative sentiment is more pronounced in fraudulent MD&As. We further find that, compared with truthful MD&As, fraudulent MD&As contain a greater proportion of subjective content than objective content. This suggests that the use of subjectivity clues such as presence of too many adjectives and adverbs could be an indicator of fraud. Clear cases of fraud show a higher intensity of sentiment exhibited by more use of adverbs in the "adverb modifying adjective" pattern. Based on the results of this study, frequent use of intensifiers, particularly in this pattern, could be another indicator of fraud. Moreover, the dimensions of subjectivity and intensity help in accurately classifying borderline examples of MD&As (that are equal in sentiment polarity) into fraudulent and truthful categories. When taken together, these findings suggest that fraudulent MD&As in contrast to truthful MD&As contain higher sentiment content.
SUMMARY Unlike previous fraud detection research, a vast majority of which has focused primarily on the use of quantitative financial information to predict fraud, in this study we examine qualitative textual content in annual reports to predict fraud and see whether there are discernible differences in the writing and presentation style between companies that committed fraud and those that did not. We believe that while numeric financial information in the annual reports can hide details of fraud, textual information relating to writing and presentation styles in such reports provides valuable clues pertaining to the existence of fraud. In this study we use the chi‐square test to analyse our data and test hypotheses about predictors of fraud that may explain linguistic feature variations in fraudulent and nonfraudulent annual reports. We provide new results on the usefulness of the qualitative content of annual reports in detecting fraud. Copyright © 2012 John Wiley & Sons, Ltd.
Textual documents proliferate throughout accounting practice, and a wide variety of groups make financial decisions based on written guidance. The Generally Accepted Accounting Principles (GAAP), along with annual corporate financial statements and other reporting narratives, are important sources of such guidance and information. This paper examines the literature in two major areas relevant to text analytics and information retrieval in the accounting domain: (1) the manual and computational content analysis of accounting narratives, accounting readability studies, and related text-mining work, and (2) the information retrieval literature stream that addresses the extraction of both text elements as well as quantities imbedded in text from accounting documents, and includes the impact of understanding the accounting lexicon upon retrieval from digital accounting documents. We use the goals in developing the GAAP Codification, as expressed by the Financial Accounting Standards Board (FASB) in their Notice to Constituents (FASB 2009), as a starting point for reviewing the literature. The paper concludes with a map for suggested future research in accounting text analytics and information retrieval.
This chapter is focused on detection of fraud in organizations by using content-based analysis on the annual reports issued by firms. Unlike a variety of previous work on fraud detection that have used quantitative financial information, this research examines qualitative textual content in annual reports to decipher evidence of fraud embedded in these reports through careful examination of the tone, content, and emphasis across reports. The basic premise of this research is that organizations tend to camouflage negative findings to sound less damaging. The real intent of the writer is hidden in content but can be revealed through structured content analysis. Using a corpus of annual reports of companies where fraud has occurred and juxtaposed with companies where fraud has not been detected, this study systematically examines the differences in the use of language. The results of this study reveal that fraudulent annual reports exhibit themes of optimism, variety, complexity, activity, and passivity. On the other hand, nonfraudulent annual reports exhibit themes of certainty and realism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.