Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.
https://feb.kuleuven.be/research/ decision-sciences-and-information-management/liris/liris
Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Therefore, possible solutions like text-based models have been developed to complement or substitute traditional business surveys. Web scraped company websites are used as input texts for these models. Previous research often focuses on the classification of companies into innovative or non-innovative through these models. This paper makes use of web scraping and text-based models in order to map the business innovation in Flanders. What differentiates this research from previously published work, is the special focus on the different types of innovation, discovered through topic modeling. More specifically, the scraped web texts are used to identify innovative economic sectors or topics, and to classify firms into these topics. The Flemish firms considered in this research are those that participated in the CIS 2019. It was found that the Top2Vec model can discover topics related to innovation within a large unstructured text corpus. Consequently, the Lbl2Vec model can use the Top2Vec output as an input to classify firms into the discovered topics. Therefore, this paper shows the potential of combining Top2Vec and Lbl2Vec model for discovering topics (or sectors) and classifying companies into these topics which results in an additional parameter for mapping innovation in different regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.