Financial and economic news is continuously monitored by financial market participants. According to the efficient market hypothesis, all past information is reflected in stock prices and new information is instantaneously absorbed in determining future stock prices. Hence, prompt extraction of positive or negative sentiments from news is very important for investment decision-making by traders, portfolio managers and investors. Sentiment analysis models can provide an efficient method for extracting actionable signals from the news. However, financial sentiment analysis is challenging due to domainspecific language and unavailability of large labeled datasets. General sentiment analysis models are ineffective when applied to specific domains such as finance. To overcome these challenges, we design an evaluation platform which we use to assess the effectiveness and performance of various sentiment analysis approaches, based on combinations of text representation methods and machine-learning classifiers. We perform more than one hundred experiments using publicly available datasets, labeled by financial experts. We start the evaluation with specific lexicons for sentiment analysis in finance and gradually build the study to include word and sentence encoders, up to the latest available NLP transformers. The results show improved efficiency of contextual embeddings in sentiment analysis compared to lexicons and fixed word and sentence encoders, even when large datasets are not available. Furthermore, distilled versions of NLP transformers produce comparable results to their larger teacher models, which makes them suitable for use in production environments.
The academic disciplines and their interrelationships represent a backbone that organizes the enormous amount of documented human knowledge available today. Having an up-to-date overview of the established disciplines, the emerging ones, and their mutual interactions is essential to the academic institutions, publishers, and many other actors involved in today's knowledge-based society, even in a situation of nonexistence of a precise definition of the term ''academic discipline'' itself. The discipline classification schemes represent crucial resources for the purpose, and in circumstances where the knowledge production rate demands discovering changes in their structure very frequently, the data-driven methodologies which facilitate their revision processes become essential. Analyzing the worldwide community's opinion on what represents a discipline, available through Wikipedia, can be very informative for the purpose, considering Wikipedia's comprehensiveness, continuous updates, and historical exports availability. This paper proposes a data-driven methodology for identification of the concepts which the worldwide community defines as disciplines at a particular moment by analyzing the information available in Wikipedia at that same moment. At the same time, it discusses Wikipedia's strengths and challenges on the task while also comparing a variety of Machine Learning and Natural Language Processing methodologies. High accuracy of the trained models is achieved on datasets created for this task specifically, and low changes in the model accuracy are observed on four Wikipedia exports from 2015 to 2018. INDEX TERMS Machine learning algorithms, natural language processing, academic discipline, text analysis, Wikipedia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.