We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a non-profit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S&P 500 index, an equity market index that measures the stock performance of US companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the US stock market. Furthermore, we analyse and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provide support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.