The existing systems for accurate sentiment analysis are mainly based on statistical and mathematical principles. However, more promising are the works that are devoted to the study of the linguistic features of the evaluation expression. The results of this formalization can be applied both in the field of affective computing for further improvement of automatic systems and for linguistics and related sciences.
The novelty of this study lies mainly in the development of an algorithm based on the identified linguistic rules. In addition, the research material is political discourse, which has not yet been studied enough by specialists of affective computing. The relevance of this work is justified by the growing need for categorization of information published on the Internet.
The purpose of the study is to develop a system for machine sentiment analysis of English-language political texts, as well as to identify aspects and their distribution for subsequent use in enhancement. The article discusses the linguistic features of sentiment analysis and suggests a classification of linguistic units with sentiment potential in relation to levels of language structure. The results of an experiment on testing the operation of the sentiment analysis system, conducted on 300 news articles and user comments taken from reddit.com/r/politics, are also presented. The accuracy of the system is 92%. In addition, the selected 40 comments were manually marked up and tagged; during this process the expert identified 25 aspects. Furthermore, 3 formal patterns were identified in the distribution of aspect terms, which is necessary for creating an automatic system. The first peculiarity is that the aspect terms are repeated in two consecutive sentences. The second is that aspect terms are often the themes of sentences. Finally, the third — a high frequency of distribution of aspect terms at the beginning and end of the text (document) was revealed.