The importance of determining sentiment for short text increases with the rise in the number of comments on social networks. The presence of negation in these texts affects their sentiment, because it has a greater range of action in proportion to the length of the text. In this paper, we examine how the treatment of negation impacts the sentiment of tweets in the Serbian language. The grammatical rules that influence the change of polarity are processed. We performed an analysis of the effect of the negation treatment on the overall process of sentiment analysis. A statistically significant relative improvement was obtained (up to 31.16% or up to 2.65%) when the negation was processed using our rules with the lexicon-based approach or machine learning methods. By applying machine learning methods, an accuracy of 68.84% was achieved on a set of positive, negative and neutral tweets, and an accuracy of as much as 91.13% when applied to the set of positive and negative tweets.
The development of information technology increases its use in various spheres of human activity, including healthcare. Bundles of data and reports are generated and stored in textual form, such as symptoms, medical history, and doctor’s observations of patients' health. Electronic recording of patient data not only facilitates day-to-day work in hospitals, enables more efficient data management and reduces material costs, but can also be used for further processing and to gain knowledge to improve public health. Publicly available health data would contribute to the development of telemedicine, e-health, epidemic control, and smart healthcare within smart cities. This paper describes the importance of textual data normalization for smart healthcare services. An algorithm for normalizing medical data in Serbian is proposed in order to prepare them for further processing (F1-score=0,816), in this case within the smart health framework. By applying this algorithm, in addition to the normalized medical records, corpora of keywords and stop words, which are specific to the medical domain, are also obtained and can be used to improve the results in the normalization of medical textual data.
By using natural language processing techniques, it is possible to get a lot of information from the extraction of document topics through mapping of document key words or content-based classification of documents, etc. To get this information, an important step is to separate words that carries informative value in a sentence from those words that do not affect its meaning. By using dictionaries of stop words specific to each natural language, the marking of words that do not carry meaning in the sentence is achieved. This paper presents creating a stop word dictionary in Serbian. The influence of stop words to the text processing is presented on three different data set. It is shown that by using proposed dictionary of Serbian stop words the data set dimension is reduced from 15% to 39%, while the quality of the obtained n-gram language models is improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.