Our research examines a predictive machine learning approach for financial news articles analysis using several different textual representations: bag of words, noun phrases, and named entities. Through this approach, we investigated 9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500 stocks during a five week period. We applied our analysis to estimate a discrete stock price twenty minutes after a news article was released. Using a support vector machine (SVM) derivative specially tailored for discrete numeric prediction and models containing different stock-specific variables, we show that the model containing both article terms and stock price at the time of article release had the best performance in closeness to the actual future stock price (MSE 0.04261), the same direction of price movement as the future price (57.1% directional accuracy) and the highest return using a simulated trading engine (2.06% return). We further investigated the different textual representations and found that a Proper Noun scheme performs better than the de facto standard of Bag of Words in all three metrics.
Can the choice of words and tone used by the authors of financial news articles correlate to measurable stock price movements? If so, can the magnitude of price movement be predicted using these same variables? We investigate these questions using the Arizona Financial Text (AZFinText) system, a financial news article prediction system, and pair it with a sentiment analysis tool. Through our analysis, we found that subjective news articles were easier to predict in price direction (59.0% versus 50.0% of chance alone) and using a simple trading engine, subjective articles garnered a 3.30% return. Looking further into the role of author tone in financial news articles, we found that articles with a negative sentiment were easiest to predict in price direction (50.9% versus 50.0% of chance alone) and a 3.04% trading return. Investigating negative sentiment further, we found that our system was able to predict price decreases in articles of a positive sentiment 53.5% of the time, and price increases in articles of a negative 2 sentiment 52.4% of the time. We believe that perhaps this result can be attributable to market traders behaving in a contrarian manner, e.g., see good news, sell; see bad news, buy.
We examine the problem of discrete stock price prediction using a synthesis of linguistic, financial and statistical techniques to create the Arizona Financial Text System (AZFinText).The research within this paper seeks to contribute to the AZFinText system by comparing AZFinText's predictions against existing quantitative funds and human stock pricing experts.We approach this line of research using textual representation and statistical machine learning methods on financial news articles partitioned by similar industry and sector groupings. Through our research, we discovered that stocks partitioned by Sectors were most predictable in measures of Closeness, Mean Squared Error (MSE) score of 0.1954, predicted Directional Accuracy of 71.18% and a Simulated Trading return of 8.50% (compared to 5.62% for the S&P 500 index).In direct comparisons to existing market experts and quantitative mutual funds, our system's trading return of 8.50% outperformed well-known trading experts. Our system also performed well against the top 10 quantitative mutual funds of 2005, where our system would have placed fifth. When comparing AZFinText against only those quantitative funds that monitor the same securities, AZFinText had a 2% higher return than the best performing quant fund.2
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.