This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. We further show how, given a new classification problem and a set of classifiers of unknown accuracy for the problem at hand, we can identify how to manage the cost-accuracy trade off by progressively determining if we should spend budget to obtain test data (to assess the accuracy of the given classifiers), or to train an ensemble of classifiers, or whether we should leverage the existing machine classifiers with the crowd, and in this case how to efficiently combine them based on their estimated characteristics to obtain the classification. We demonstrate that the techniques we propose obtain significant cost/accuracy improvements with respect to the leading classification algorithms.
ACM Reference Format:Evgeny Krivosheev, Fabio Casati, Marcos Baez, and Boualem Benatallah. 2019. Combining Crowd and Machines for Multi-predicate Item Screening. 1, 1 (April 2019), 18 pages. https://doi.org/0000001.0000001
BACKGROUND AND MOTIVATIONA frequently occurring classification problem consists in identifying items that pass a set of screening tests (filters). This is not only common in medical diagnosis but in many other fields as well, from database querying -where we filter tuples based on predicates [Parameswaran et al. 2014], to hotel search -where we filter places based on features of interest [Lan et al. 2017], to systematic literature reviews (SLR) -where we screen candidate papers based on a set of exclusion criteria to assess whether they are in scope for the review [Wallace et al. 2017]. The goal of this paper is to understand how, given a set of trained classifiers whose accuracy may or may not be known for the problem at hand (for a specific query predicate, hotel feature, or paper topic), we can combine machine learning (ML) and human (H) classifiers to create a hybrid classifier that screens items efficiently in terms of cost of querying the crowd, while ensuring an accuracy that is acceptable for the given problem. We focus specifically on the common scenario of finite pool problems, where the set of items to screen is limited and where therefore it may not be cost-effective to collect sufficient data to train accurate classifiers for each specific case. To make the paper easier to read :2 omitted et al. and the problem concrete, we will often take the example of SLRs mentioned above, which is rather challenging in that each SLR is different and each filtering predicate (called exclusion criterion in that context) could be unique to each SLR (e.g., "exclude papers that do not study adults 85+ years old").The area of crowd-only and of hybrid (ML+H) classification has received a lot of attention in the literature. Research in crowdsourcing has identified h...