We address a text regression problem: given a piece of text, predict a real-world continuous quantity associated with the text's meaning. In this work, the text is an SEC-mandated financial report published annually by a publiclytraded company, and the quantity to be predicted is volatility of stock returns, an empirical measure of financial risk. We apply wellknown regression techniques to a large corpus of freely available financial reports, constructing regression models of volatility for the period following a report. Our models rival past volatility (a strong baseline) in predicting the target variable, and a single model that uses both can significantly outperform past volatility. Interestingly, our approach is more accurate for reports after the passage of the Sarbanes-Oxley Act of 2002, giving some evidence for the success of that legislation in making financial reports more informative.
Abstract:A basic tenet of financial economics is that asset prices change in response to unexpected fundamental information. Since Roll's (1988) provocative presidential address that showed little relation between stock prices and news, however, the finance literature has had limited success reversing this finding. This paper revisits this topic in a novel way. Using advancements in the area of textual analysis, we are better able to identify relevant news, both by type and by tone. Once news is correctly identified in this manner, there is considerably more evidence of a strong relationship between stock price changes and information. For example, market model R 2 s are no longer the same on news versus no news days (i.e., Roll's (1988) infamous result), but now are 16% versus 33%; variance ratios of returns on identified news versus no news days are 120% higher versus only 20% for unidentified news versus no news; and, conditional on extreme moves, stock price reversals occur on no news days, while identified news days show an opposite effect, namely a strong degree of continuation. A number of these results are strengthened further when the tone of the news is taken into account by measuring the positive/negative sentiment of the news story.1 Corresponding author: Shimon Kogan, GSB 5.159, McCombs School of Business, University of Texas at Austin, 1 University Station, B6600, Austin, TX 78712, Tel: +1 (512) 232-6839, email shimon.kogan@austin.utexas.edu. We would like to thank John Griffin and seminar participants at the University of Texas, Austin, and Stern NYU for their comments and suggestions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.