Recommender system (RS) clustering is an important issue, both for the improvement of the collaborative filtering (CF) accuracy and to obtain analytical information from their high sparse datasets. RS items and users usually share features belonging to different clusters, e.g., a musical-comedy movie. Soft clustering, therefore, is the CF clustering's most natural approach. In this paper, we propose a new prediction approach for probabilistic soft clustering methods. In addition, we put to test a not traditional scientific documentation CF dataset: SD4AI, and we compare results with the MovieLens baseline. Not traditional CF datasets have challenging features, such as not regular rating frequency distributions, broad range of rating values, and a particularly high sparsity. The results show the suitability of using soft-clustering approaches, where their probabilistic overlapping parameters find optimum values when balanced hard/soft clustering is used. This paper opens some promising lines of research, such as RSs' use in the scientific documentation field, the Internet of Things-based datasets processing, and design of new model-based soft clustering methods.
INDEX TERMSSoft clustering, scientific documentation, collaborative filtering, recommender systems. JESÚS BOBADILLA received the B.S. degree in computer science from the Universidad Politécnica de Madrid and the Ph.D. degree in computer science from Universidad Carlos III.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.