Just like other languages, a large number of intensifiers are used in Urdu language. There may be a single occurrence of an intensifier or there may be multiple consecutive occurrences. While performing sentiment analysis of Urdu text, these intensifiers need special treatment for obtaining more accurate results, which is the main focus of this research work. A wide coverage Urdu sentiment lexicon is developed where intensifiers are identified and placed in a separate file. While developing Urdu sentiment analyser, rules are specifically formulated for assigning polarities to the intensifiers in text, if they are surrounded by the positive or negative words. Results show that the method proved to be effective in attaining the correct classification of Urdu sentences as positive, negative, or neutral, compared with traditional methods. Implementation of rules for intensifiers increased the accuracy of Urdu sentiment analyser from 78.33% to 83.42%, which is a statistically significant improvement. It is concluded that intensifiers cannot be ignored while performing sentiment analysis. Effective handling of intensifiers can significantly improve the performance of sentiment analyser.
The characteristic of context-dependency in Urdu words needs to be handled carefully while performing Urdu Sentiment Analysis. In this research, an already constructed Urdu Sentiment Lexicon of positive and negative words is further expanded by the addition of context-dependent words. These context-dependent words are used with or without conjunctions. Rules are formulated for assigning polarities to those context-dependent words that are surrounded by the positive or negative words. These rules are incorporated in Urdu Sentiment Analyzer. Fusion of these rules for handling context-dependent words and the expanded Urdu Sentiment Lexicon resulted in increasing the accuracy of the Urdu Sentiment Analyzer from 83.43% to 89.03% with 0.8655 Precision, 0.9053 Recall and 0.8799 F-measure, which is a statistically significant improvement.
Although work has been done in Urdu Sentiment Analysis by researchers but still there is a lot of room for improvement in the form of achieving higher accuracy. Therefore, in this research, the accuracy of Urdu Sentiment Analysis in multiple domains is enhanced by dealing negations using Lexicon-based approach, one of the broadly used approaches for performing Sentiment Analysis. Negations in Urdu Sentiment Analysis are particularly focused in this research because of their effective role in Sentiment Analysis. Both local and long distance negations are considered. For achieving this goal, a corpus with 6025 Urdu sentences, from 151 blogs that belong to 14 different genres is taken in which use of negations is carefully observed. Two major steps are taken in this regard. First, to deal with the morphological negations, this type of negations is included in the negative word file of the Urdu Sentiment Lexicon developed for performing Sentiment Analysis of Urdu blogs. Secondly, rule-based approach is used for handling the implicit and explicit negations. Rules are designed that can deal with both implicit and explicit negations effectively. Implementation of these rules increased the accuracy of Sentiment Analyzer from 73.88% to 78.32% with 0.745, 0.788 and 0.745 Precision, Recall and Fmeasure respectively, which is statistically significant improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.