Abstract. Crowdsourcing services have been proven efficient in collecting large amount of labeled data for supervised learning, but low cost of crowd workers leads to unreliable labels. Various methods have been proposed to infer the ground truth or learn from crowd data directly though, there is no guarantee that these methods work well for highly biased or noisy crowd labels. Motivated by this limitation of crowd data, we propose to improve the performance of crowdsourcing learning tasks with some additional expert labels by treating each labeler as a personal classifier and combining all labelers' opinions from a model combination perspective. Experiments show that our method can significantly improve the learning quality as compared with those methods solely using crowd labels.
Data sets collected from crowdsourcing platforms are well known for their cheap costs. But cheap costs may lead to low quality, i.e., labels may be incorrect or missing. Most of the existing work focuses on modeling the labeling errors of crowd workers, but missing labels can also cause problems when modeling the data. In this paper, we present an algorithm to predict the missing labels of crowd workers, in which we adopt thoughts from semi-supervised learning and utilize the particular consistency between crowd workers. We also define the consistency between workers by crowd labels and develop an algorithm to learn them from the data automatically. Experiments on both benchmark and real data show that our algorithm outperforms traditional semisupervised learning algorithms in predicting missing labels, and the recovered crowd labels are capable of predicting the ground truth and reflecting real properties of crowd workers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.