The aim of this research is to propose a binary segmentation algorithm to detect the change points in financial time-series based on the Iterative Cumulative Sum of Squares (ICSS). The proposed algorithm, entitled KW-ICSS, utilizes the non-parametric Kruskal-Wallis test in cross-validation procedures. In this regard, KW-ICSS can quickly detect the change points in non-normally distributed time-series with a small number of observations after the change points than the state-of-the-art ICSS algorithm, entitled AIT-ICSS. For the simulated financial time-series whose true location of the change point is known, KW-ICSS detects the change points with the average true positive rate of 81% for the different number of change points, whereas AIT-ICSS only exhibits 72.57%. Also, KW-ICSS's mean absolute deviation between the true and detected change points is less than that of AIT-ICSS for different significance levels. The experiment also finds that the significance level, the model parameter, should be set to less than 10%. For the real-world financial time-series whose true location of change points is unknown, KW-ICSS's robust detection of change points is observed from fewer detected change points and longer intervals between them. Furthermore, KW-ICSS's trend prediction for the short-term future performs with an average of 92.47% accuracy, whereas AIT-ICSS shows 90.69%. Therefore, we claim that KW-ICSS successfully improves AIT-ICSS.INDEX TERMS Unsupervised learning, change point detection, iterative cumulative sum of squares, Kruskal-Wallis.
This research examines and proposes an investment strategy by combining the natural language processing on the equity research reports published in the Korean financial market and machine learning algorithms for binary classification. At first, we deduce the part-of-speech from the report using the KoNLPy and Mecab. Then, we define 33 features as the input variables and perform the binary classification on the price direction of the stocks recommended in the report using various machine learning algorithms. Note that we investigate the model performance in detail by dividing the entire period into three sub-periods, including pre-COVID-19 for the sideways market, COVID-19 for the crashing market, and post-COVID-19 for the extreme bullish market. We confirm that the random forest is the best classifier for all periods, so we utilize its results on positively predicted stocks in the test set as the investment universe for the monthly re-balancing and buy-and-hold investment. The proposed strategy shows a significantly higher return on investment than benchmarks during the pre-COVID-19 and COVID-19 periods, whereas the comparable return during the post-COVID-19. INDEX TERMS Finance, Natural language processing, stock markets, Equity research reports, Binary classification, Investment strategy I. INTRODUCTION Financial companies periodically issue research reports for investors. The report contents include analyzing companies, financial institutions, diplomatic issues between countries, and politics. Among them, this study focuses on the equity research report that recommends a specific stock at a time. Usually, analysts write their perspective on a stock expected to show high returns in the future through various quantitative and qualitative analyses. However, the profit in the future varies in different reports. One reason for such a result is that the person who writes the report may not be equipped with enough analytical skills, extending to low-quality reports. In this study, we assume that the composition of the equity research reports quantified through natural language processing (NLP) can distinguish the stock recommendations' reliability. In the 2010s, the digital online content volume has exploded, including market analysis reports, news articles, journal texts, online blogs, and social media. Accordingly, research on analyzing public sentiment, especially opinion mining in social media, has become essential. As the mar-21 ket prediction using NLP algorithms has been studied in 22 the financial field, a research field called natural language-23 based financial forecasting has been gradually established 24 [1]-[4]. In particular, the stock market has received great 25 attention in academia due to its sensitivity to market partic-26 ipants' sentiment. That is, investors' sentiment can change 27 the overall trend of individual stocks and even the market. 28 Many previous studies have analyzed investors' opinions and 29 market sentiment from social media posts regarding financial 30 markets. Some studies extra...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.