Summary
We focus on textual analysis of the US Securities and Exchange Commission's accounting and auditing enforcement releases (AAERs). Our research question is: Did the Sarbanes–Oxley Act (SOX) 2002 affect the qualitative linguistic content of the AAERs in the post‐SOX period? To answer this question, we test the null hypotheses that there will be no differences in the qualitative verbiage and sentiment of AAERs in the time periods that we study related to the enactment of SOX: pre‐SOX and post‐SOX. To resolve the research question, we applied several text mining methods and classification machine‐learning methods. We first used two basic text‐mining methods, generating a bag of words and topic modeling, for descriptive analysis of the AAER content before the enactment of SOX and after the enforcement of SOX. We then conducted sentiment analysis using four sentiment dictionaries on the content of the two time periods: before SOX and after SOX. Finally, we developed three different classification models based on well‐known supervised learning algorithms and determined that SOX‐related AAERs could be distinguished from non‐SOX‐related AAERs. Based on the results, we conclude that there were significant linguistic differences in the AAER content of the post‐SOX period compared with the pre‐SOX period. In other words, post‐SOX‐related AAERs are qualitatively different in terms of linguistic contents and sentiment values compared with the non‐SOX‐related AAERs.