Systematic literature reviews (SLRs) are one of the most common and useful form of scientific research and publication. Tens of thousands of SLRs are published each year, and this rate is growing across all fields of science. Performing an accurate, complete and unbiased SLR is however a difficult and expensive endeavor. This is true in general for all phases of a literature review, and in particular for the paper screening phase, where authors filter a set of potentially in-scope papers based on a number of exclusion criteria.To address the problem, in recent years the research community has began to explore the use of the crowd to allow for a faster, accurate, cheaper and unbiased screening of papers. Initial results show that crowdsourcing can be effective, even for relatively complex reviews.In this paper we derive and analyze a set of strategies for crowdbased screening, and show that an adaptive strategy, that continuously re-assesses the statistical properties of the problem to minimize the number of votes needed to take decisions for each paper, significantly outperforms a number of non-adaptive approaches in terms of cost and accuracy. We validate both applicability and results of the approach through a set of crowdsourcing experiments, and discuss properties of the problem and algorithms that we believe to be generally of interest for classification problems where items are classified via a series of successive tests (as it often happens in medicine).
This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. We further show how, given a new classification problem and a set of classifiers of unknown accuracy for the problem at hand, we can identify how to manage the cost-accuracy trade off by progressively determining if we should spend budget to obtain test data (to assess the accuracy of the given classifiers), or to train an ensemble of classifiers, or whether we should leverage the existing machine classifiers with the crowd, and in this case how to efficiently combine them based on their estimated characteristics to obtain the classification. We demonstrate that the techniques we propose obtain significant cost/accuracy improvements with respect to the leading classification algorithms. ACM Reference Format:Evgeny Krivosheev, Fabio Casati, Marcos Baez, and Boualem Benatallah. 2019. Combining Crowd and Machines for Multi-predicate Item Screening. 1, 1 (April 2019), 18 pages. https://doi.org/0000001.0000001 BACKGROUND AND MOTIVATIONA frequently occurring classification problem consists in identifying items that pass a set of screening tests (filters). This is not only common in medical diagnosis but in many other fields as well, from database querying -where we filter tuples based on predicates [Parameswaran et al. 2014], to hotel search -where we filter places based on features of interest [Lan et al. 2017], to systematic literature reviews (SLR) -where we screen candidate papers based on a set of exclusion criteria to assess whether they are in scope for the review [Wallace et al. 2017]. The goal of this paper is to understand how, given a set of trained classifiers whose accuracy may or may not be known for the problem at hand (for a specific query predicate, hotel feature, or paper topic), we can combine machine learning (ML) and human (H) classifiers to create a hybrid classifier that screens items efficiently in terms of cost of querying the crowd, while ensuring an accuracy that is acceptable for the given problem. We focus specifically on the common scenario of finite pool problems, where the set of items to screen is limited and where therefore it may not be cost-effective to collect sufficient data to train accurate classifiers for each specific case. To make the paper easier to read :2 omitted et al. and the problem concrete, we will often take the example of SLRs mentioned above, which is rather challenging in that each SLR is different and each filtering predicate (called exclusion criterion in that context) could be unique to each SLR (e.g., "exclude papers that do not study adults 85+ years old").The area of crowd-only and of hybrid (ML+H) classification has received a lot of attention in the literature. Research in crowdsourcing has identified h...
Training data creation is increasingly a key bottleneck for developing machine learning, especially for deep learning systems. Active learning provides a cost-effective means for creating training data by selecting the most informative instances for labeling. Labels in real applications are often collected from crowdsourcing, which engages online crowds for data labeling at scale. Despite the importance of using crowdsourced data in the active learning process, an analysis of how the existing active learning approaches behave over crowdsourced data is currently missing. This paper aims to fill this gap by reviewing the existing active learning approaches and then testing a set of benchmarking ones on crowdsourced datasets. We provide a comprehensive and systematic survey of the recent research on active learning in the hybrid human–machine classification setting, where crowd workers contribute labels (often noisy) to either directly classify data instances or to train machine learning models. We identify three categories of state of the art active learning methods according to whether and how predefined queries employed for data sampling, namely fixed-strategy approaches, dynamic-strategy approaches, and strategy-free approaches. We then conduct an empirical study on their cost-effectiveness, showing that the performance of the existing active learning approaches is affected by many factors in hybrid classification contexts, such as the noise level of data, label fusion technique used, and the specific characteristics of the task. Finally, we discuss challenges and identify potential directions to design active learning strategies for hybrid classification problems.
Crowdsourcing is a challenging activity for many reasons, from task design to workers' training, identification of low-quality annotators, and many more. A particularly subtle form of error is due to confusion of observations, that is, crowd workers (including diligent ones) that confuse items of a class i with items of a class j , either because they are similar or because the task description has failed to explain the differences. In this paper we show that confusion of observations can be a frequent occurrence in many tasks, and that such confusions cause a significant loss in accuracy. As a consequence, confusion detection is of primary importance for crowdsourced data labeling and classification. To address this problem we introduce an algorithm for confusion detection that leverages an inference procedure based on Markov Chain Monte Carlo (MCMC) sampling. We evaluate the algorithm via both synthetic datasets and crowdsourcing experiments and show that it has high accuracy in confusion detection (up to 99%). We experimentally show that quality is significantly improved without sacrificing efficiency. Finally, we show that detecting confusion is important as it can alert task designers early in the crowdsourcing process and lead designers to modify the task or add specific training and information to reduce the occurrence of workers' confusion. We show that even simple modifications, such as alerting workers of the risk of confusion, can improve performance significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.