Stock forecasting is a significant and challenging task. The recent development of web technologies has transformed the communication channel to allow the public to share information over the web such as news, social media contents, etc., thus causing exponential growth of web data. The massively available information might be the key to revealing the financial market’s unexplained variability and facilitating forecasting accuracy. However, this information is usually in unstructured natural language and consists of different inherent meanings. Although a human can easily interpret the inherent messages, it is still complicated to manually process such a massive amount of textual data due to the constraint of time, ability, energy, etc. Due to the different properties of text sources, it is crucial to understand various text processing approaches to optimize forecasting performance. This study attempted to summarize and discuss the current text-based financial forecasting approaches in the aspect of semantic-based, sentiment-based, event-extraction-based, and hybrid approaches. Afterward, the study discussed the strength and weakness of each approach, followed with their comparison and suitable application scenarios. Moreover, this study also highlighted the future research direction in text-based stock forecasting, where the overall discussion is expected to provide insightful analysis for future reference.