This study aims to predict the direction of US stock prices by integrating time-varying effective transfer entropy (ETE) and various machine learning algorithms. At first, we explore that the ETE based on 3 and 6 months moving windows can be regarded as the market explanatory variable by analyzing the association between the financial crises and Granger-causal relationships among the stocks. Then, we discover that the prediction performance on the stock price direction can be improved when the ETE driven variable is integrated as a new feature in the logistic regression, multilayer perceptron, random forest, XGBoost, and long short-term memory network. Meanwhile, we suggest utilizing the adjusted accuracy derived from the risk-adjusted return in finance as a prediction performance measure. Lastly, we confirm that the multilayer perceptron and long short-term memory network are more suitable for stock price prediction. This study is the first attempt to predict the stock price direction using ETE, which can be conveniently applied to the practical field. INDEX TERMS Econophysics, Effective transfer entropy, Feature engineering, Information entropy, Machine learning, Prediction algorithms, Stock markets, Time series analysis 19 characteristics of the system using random matrix theory 20 and network analysis [7]-[9]. Since then, the studies have 21 discovered that a linear model such as the Pearson correlation 22 is not sufficient enough to quantify the relationships among 23 the stocks. More importantly, the causal relationship is not 24 directly linked to the presence of correlation. In this context, 25 the Granger-causality [10] has been introduced to define 26 the causal relationship between time series. However, its 27 function is limited to express the existence of information 28 flow based on a linear relationship rather than measuring the 29 amount of information flow. 30 To overcome such limitations of a simple linear model of a 31 Granger-causal relationship, the concept of transfer entropy 32 (TE), proposed by Schreiber [11], has been suggested instead 33 to measure the amount of information flow. TE is a non-34 parametric measure of the amount of information transfer 35 from a variable to a variable based on the Shannon entropy 36
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.