The rapid growth of opinionated text on the Web increases the demand for efficient methods for detecting subjective texts. In this paper, a subjectivity detection method is proposed which utilizes a language-model-based structure to define a subjectivity score for each document where the topic relevance of documents does not affect the subjectivity scores. In order to overcome the limited content in short documents, we further propose an expansion method to better estimate the language models. Since the lack of linguistic resources in resource-lean languages like Persian makes subjectivity detection difficult in these languages, the method is proposed in two versions: a semi-supervised version for resource-lean languages and a supervised version. Experimental evaluations on five datasets in two languages, English and Persian, demonstrate that the method performs well in distinguishing subjective documents from objective ones in both languages.
In this paper, the used methods and the results obtained by our team, entitled Emad, on the OffensEval 2019 shared task organized at Se-mEval 2019 are presented. The OffensEval shared task includes three sub-tasks namely Offensive language identification, Automatic categorization of offense types and Offense target identification. We participated in subtask A and tried various methods including traditional machine learning methods, deep learning methods and also a combination of the first two sets of methods. We also proposed a data augmentation method using word embedding to improve the performance of our methods. The results show that the augmentation approach outperforms other methods in terms of macro-f1.
News media websites are important online resources that have drawn great attention of text mining researchers. The main aim of this study is to propose a framework for ranking online news websites from different viewpoints. The ranking of news websites is useful information, which can benefit many news-related tasks such as news retrieval and news recommendation. In the proposed framework, the ranking of news websites is obtained by calculating three measures introduced in the paper and based on user-generated content. Each proposed measure is concerned with the performance of news websites from a particular viewpoint including the completeness of news reports, the diversity of events being covered by the website and its speed. The use of user-generated content in this framework, as a partly-unbiased, real-time and low cost content on the web distinguishes the proposed news website ranking framework from the literature. The results obtained for three prominent news websites, BBC, CNN, NYTimes, show that BBC has the best performance in terms of news completeness and speed, and NYTimes has the best diversity in comparison with the other two websites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.