Machine learning algorithms and recommender systems trained on human ratings are widely in use today. However, human ratings may be associated with a high level of uncertainty and are subjective, influenced by demographic or psychological factors. We propose a new approach to the design of object classes from human ratings: the use of entire distributions to construct classes. By avoiding aggregation for class definition, our approach loses no information and can deal with highly volatile or conflicting ratings. The approach is based the concept of the Earth Mover's Distance (EMD), a measure of distance for distributions. We evaluate the proposed approach based on four datasets obtained from diverse Web content or movie quality evaluation services or experiments. We show that clusters discovered in these datasets using the EMD measure are characterized by a consistent and simple interpretation. Quality classes defined using entire rating distributions can be fitted to clusters of distributions in the four datasets using two parameters, resulting in a good overall fit. We also consider the impact of the composition of small samples on the distributions that are the basis of our classification approach. We show that using distributions based on small samples of 10 evaluations is still robust to several demographic and psychological variables. This observation suggests that the proposed approach can be used in practice for quality evaluation, even for highly uncertain and subjective ratings.
Abstract-The following article is created as a result of the AAIA'17 Data Mining Challenge: Helping AI to Play Hearthstone. The Challenge goal was to correctly predict which bot would win a bot-vs-bot Hearthstone match based on what was known at the given time. Hearthstone is an online two-players card game with imperfect information (unlike chess and go, and like poker), where the goal of one player is to defeat their opponent by decreasing their "life points" to zero (while not allowing the opponent to do the same to oneself). Two main challenges were present: the transformation of hierarchically structured data into a two-dimensional matrix, and dealing with Non-IID data (certain cards were present only in test data). A way of how to successfully cope with those complications while using state-ofthe-art machine learning algorithms (e.g. Microsoft's LightGBM) is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.