Combining Crowd and Machines for Multi-predicate Item Screening

Krivosheev, Evgeny; Casati, Fabio; Báez, Marcos; Benatallah, Boualem

doi:10.1145/3274366

Cited by 17 publications

(11 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Where multiple elements of the articles need to be assessed, machine learning can require considerable costs related to training. Although a comparison of crowdsourcing with text-mining performance is valid, it is also worthwhile considering that by combining machine learning and crowdsourcing together may lead to the greatest workload reduction for the crowd and investigative teams [43,44]. This hybrid approach has been researched and applied in a variety of fields outside the SR field.…”

Section: Discussionmentioning

confidence: 99%

Crowdsourcing the Citation Screening Process for Systematic Reviews: Validation Study

Nama¹,

Sampson²,

Barrowman³

et al. 2019

J Med Internet Res

View full text Add to dashboard Cite

Background Systematic reviews (SRs) are often cited as the highest level of evidence available as they involve the identification and synthesis of published studies on a topic. Unfortunately, it is increasingly challenging for small teams to complete SR procedures in a reasonable time period, given the exponential rise in the volume of primary literature. Crowdsourcing has been postulated as a potential solution. Objective The feasibility objective of this study was to determine whether a crowd would be willing to perform and complete abstract and full text screening. The validation objective was to assess the quality of the crowd’s work, including retention of eligible citations (sensitivity) and work performed for the investigative team, defined as the percentage of citations excluded by the crowd. Methods We performed a prospective study evaluating crowdsourcing essential components of an SR, including abstract screening, document retrieval, and full text assessment. Using CrowdScreenSR citation screening software, 2323 articles from 6 SRs were available to an online crowd. Citations excluded by less than or equal to 75% of the crowd were moved forward for full text assessment. For the validation component, performance of the crowd was compared with citation review through the accepted, gold standard, trained expert approach. Results Of 312 potential crowd members, 117 (37.5%) commenced abstract screening and 71 (22.8%) completed the minimum requirement of 50 citation assessments. The majority of participants were undergraduate or medical students (192/312, 61.5%). The crowd screened 16,988 abstracts (median: 8 per citation; interquartile range [IQR] 7-8), and all citations achieved the minimum of 4 assessments after a median of 42 days (IQR 26-67). Crowd members retrieved 83.5% (774/927) of the articles that progressed to the full text phase. A total of 7604 full text assessments were completed (median: 7 per citation; IQR 3-11). Citations from all but 1 review achieved the minimum of 4 assessments after a median of 36 days (IQR 24-70), with 1 review remaining incomplete after 3 months. When complete crowd member agreement at both levels was required for exclusion, sensitivity was 100% (95% CI 97.9-100) and work performed was calculated at 68.3% (95% CI 66.4-70.1). Using the predefined alternative 75% exclusion threshold, sensitivity remained 100% and work performed increased to 72.9% (95% CI 71.0-74.6; P <.001). Finally, when a simple majority threshold was considered, sensitivity decreased marginally to 98.9% (95% CI 96.0-99.7; P =.25) and work performed increased substantially to 80.4% (95% CI 78.7-82.0; P <.001). Conclusions Crowdsourcing of citation screening for SRs is feasible and has reasonable sensitivity and specificity. By expediting the screening process, crowdsourcing could permit the i...

show abstract

Section: Discussionmentioning

confidence: 99%

Crowdsourcing the Citation Screening Process for Systematic Reviews: Validation Study

Nama¹,

Sampson²,

Barrowman³

et al. 2019

J Med Internet Res

View full text Add to dashboard Cite

show abstract

“…In this section, we examine the behavior of AL approaches in crowdsourcing settings. Specifically, we focus on problems where we start from a blank slate, have a pool of items to classify and a crowd at our disposal, and need not only to choose/assess AL approaches but also to assess if the crowd is leveraged only to get labeled data for training or also to perform classification at inference time, as done in hybrid classification contexts (Krivosheev et al 2018a;Callaghan et al 2018).…”

Section: Experimental Workmentioning

confidence: 99%

“…The Amazon Sentiment-1 4 dataset (Krivosheev et al 2018a) includes annotations about deciding whether the given product review belongs to a book or not. Similarly, the Amazon Sentiment-2 4 dataset (Krivosheev et al 2018a) includes annotations about whether the given product review has a negative or positive sentiment. The Crisis-1 5 dataset (Imran et al 2013) consists of human-labeled tweets collected during the 2012 Hurricane Sandy and the 2011 Joplin tornado.…”

Section: Datasetsmentioning

confidence: 99%

A review and experimental analysis of active learning over crowdsourced data

et al. 2021

Self Cite

View full text Add to dashboard Cite

Training data creation is increasingly a key bottleneck for developing machine learning, especially for deep learning systems. Active learning provides a cost-effective means for creating training data by selecting the most informative instances for labeling. Labels in real applications are often collected from crowdsourcing, which engages online crowds for data labeling at scale. Despite the importance of using crowdsourced data in the active learning process, an analysis of how the existing active learning approaches behave over crowdsourced data is currently missing. This paper aims to fill this gap by reviewing the existing active learning approaches and then testing a set of benchmarking ones on crowdsourced datasets. We provide a comprehensive and systematic survey of the recent research on active learning in the hybrid human–machine classification setting, where crowd workers contribute labels (often noisy) to either directly classify data instances or to train machine learning models. We identify three categories of state of the art active learning methods according to whether and how predefined queries employed for data sampling, namely fixed-strategy approaches, dynamic-strategy approaches, and strategy-free approaches. We then conduct an empirical study on their cost-effectiveness, showing that the performance of the existing active learning approaches is affected by many factors in hybrid classification contexts, such as the noise level of data, label fusion technique used, and the specific characteristics of the task. Finally, we discuss challenges and identify potential directions to design active learning strategies for hybrid classification problems.

show abstract

“…Text classification, in particular, is a recurrent goal of machine learning (ML) projects, and a typical task in crowdsourcing platforms. Hybrid approaches, combining ML and crowd efforts, have been proposed to boost accuracy and reduce costs [2–4]. One possibility is to use automatic techniques for highlighting relevant excerpts in the text and then ask workers to classify.…”

Section: Objectivementioning

confidence: 99%

Crowdsourced dataset to study the generation and impact of text highlighting in classification tasks

Ramírez¹,

Báez²,

Casati³

et al. 2019

BMC Res Notes

Self Cite

View full text Add to dashboard Cite

ObjectivesText classification is a recurrent goal in machine learning projects and a typical task in crowdsourcing platforms. Hybrid approaches, leveraging crowdsourcing and machine learning, work better than either in isolation and help to reduce crowdsourcing costs. One way to mix crowd and machine efforts is to have algorithms highlight passages from texts and feed these to the crowd for classification. In this paper, we present a dataset to study text highlighting generation and its impact on document classification.Data descriptionThe dataset was created through two series of experiments where we first asked workers to (i) classify documents according to a relevance question and to highlight parts of the text that supported their decision, and on a second phase, (ii) to assess document relevance but supported by text highlighting of varying quality (six human-generated and six machine-generated highlighting conditions). The dataset features documents from two application domains: systematic literature reviews and product reviews, three document sizes, and three relevance questions of different levels of difficulty. We expect this dataset of 27,711 individual judgments from 1851 workers to benefit not only this specific problem domain, but the larger class of classification problems where crowdsourced datasets with individual judgments are scarce.

show abstract

Combining Crowd and Machines for Multi-predicate Item Screening

Cited by 17 publications

References 28 publications

Crowdsourcing the Citation Screening Process for Systematic Reviews: Validation Study

Crowdsourcing the Citation Screening Process for Systematic Reviews: Validation Study

A review and experimental analysis of active learning over crowdsourced data

Crowdsourced dataset to study the generation and impact of text highlighting in classification tasks

Contact Info

Product

Resources

About