Arabic news articles in electronic collections are difficult to work with. Browsing by category is rarely supported. While helpful machine learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a QNRF funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237K Arabic news articles, which should be applicable to other Arabic news collections as well. We designed a simple taxonomy for Arabic news stories that is suitable for the needs in Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer) and automatic classification methods (the best being binary SVM classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
Recommender systems are used to suggest items to users based on their interests. They have been used widely in various domains, including online stores, web advertisements, and social networks. As part of their process, recommender systems use a set of similarity measurements that would assist in finding interesting items. Although many similarity measurements have been proposed in the literature, they have not concentrated on actual user interests. This paper proposes a new efficient hybrid similarity measure for recommender systems based on user interests. This similarity measure is a combination of two novel base similarity measurements: the user interest-user interest similarity measure and the user interest-item similarity measure. This hybrid similarity measure improves the existing work in three aspects. First, it improves the current recommender systems by using actual user interests. Second, it provides a comprehensive evaluation of an efficient solution to the cold start problem. Third, this similarity measure works well even when no corated items exist between two users. Our experiments show that our proposed similarity measure is efficient in terms of accuracy, execution time, and applicability. Specifically, our proposed similarity measure achieves a mean absolute error (MAE) as low as 0.42, with 64% applicability and an execution time as low as 0.03 s, whereas the existing similarity measures from the literature achieve an MAE of 0.88 at their best; these results demonstrate the superiority of our proposed similarity measure in terms of accuracy, as well as having a high applicability percentage and a very short execution time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.