Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study

Kentley, Jonathan; Weber, Jochen; Liopyris, Konstantinos; Braun, Ralph P.; Marghoob, Ashfaq A.; Quigley, Elizabeth; Nelson, Kelly C.; Prentice, Kira; Duhaime, Erik P.; Halpern, Allan C.; Rotemberg, Veronica

doi:10.2196/38412

Cited by 11 publications

(13 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several such projects can take months to obtain high-quality labels (Cocos et al, 2017 ). Third, cost is a major factor in being able to determine the viability of such a project (Kentley et al, 2023 ; Ørting et al, 2020 ). By paying the crowd-sourced workers a total of in daily rewards over 14 days, Centaur Labs obtained 143,209 classification labels.…”

Section: Discussionmentioning

confidence: 99%

“…This bolsters some of the wisdom of the crowd findings where novices, such as undergraduate psychology students, could learn to classify white blood cell images which when combined together exceeded expert performance (Hasan et al, 2023 ). Non-experts recruited in DiagnosUs with Centaur Labs showed that with a little training, crowds could identify complex lesion attributes (Kentley et al, 2023 ). This opens up the possibility of expanding the scope of citizen science projects (Cohn, 2008 ; Sullivan et al, 2014 ).…”

Section: Discussionmentioning

confidence: 99%

“…The future of medical artificial intelligence (AI) relies on the existence of large, high-quality labeled biomedical image datasets for machine learning training (Ørting et al, 2020 ; Codella et al, 2019 ; Tschandl et al, 2018 ). Currently, the lack of such datasets is considered one of the largest bottlenecks in the development and training of medical AI systems (Ørting et al, 2020 ; Kentley et al, 2023 ; Duhaime et al, 2023 ). Traditionally, these datasets have been meticulously curated based on the consensus of expert medical professionals (Tschandl et al, 2018 ; van der Wal et al, 2021 ).…”

Section: Introductionmentioning

confidence: 99%

“…In contrast, the labeling of datasets involving everyday objects, such as ImageNet, scales easily through the use of online crowdsourcing (Deng et al, 2009 ). Thus, some researchers and entrepreneurs have suggested that labeling medical images through crowdsourcing might provide one solution to the medical AI data bottleneck (Ørting et al, 2020 ; Alialy et al, 2018 ; Kentley et al, 2023 ; Duhaime et al, 2023 ).…”

Section: Introductionmentioning

confidence: 99%

“…Translating the wisdom of the crowds from a controlled lab environment to a real-world application requires the testing and development of scalable systems that can acquire a large number of decisions in a short time at low costs (Kentley et al, 2023 ; Ørting et al, 2020 ; Duhaime et al, 2023 ). A company—Centaur Labs—developed an app-based platform where individuals with a varying range of experience and expertise sign up to provide medical decisions (Press, 2021 ; Duhaime et al, 2023 ).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Boosting wisdom of the crowd for medical image annotation using training performance and task features

Hasan,

Duhaime,

Trueblood

2024

Cogn. Research

Self Cite

View full text Add to dashboard Cite

A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Boosting wisdom of the crowd for medical image annotation using training performance and task features

Hasan,

Duhaime,

Trueblood

2024

Cogn. Research

Self Cite

View full text Add to dashboard Cite

show abstract

Towards Expert-Amateur Collaboration: Prototypical Label Isolation Learning for Left Atrium Segmentation with Mixed-Quality Labels

Xu,

Yan,

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Quality Assured: Rethinking Annotation Strategies in Imaging AI

Rädsch,

Reinke,

Weru

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper does not describe a novel method. Instead, it studies an essential foundation for reliable benchmarking and ultimately real-world application of AI-based image analysis: generating high-quality reference annotations. Previous research has focused on crowdsourcing as a means of outsourcing annotations. However, little attention has so far been given to annotation companies, specifically regarding their internal quality assurance (QA) processes. Therefore, our aim is to evaluate the influence of QA employed by annotation companies on annotation quality and devise methodologies for maximizing data annotation efficacy. Based on a total of 57,648 instance segmented images obtained from a total of 924 annotators and 34 QA workers from four annotation companies and Amazon Mechanical Turk (MTurk), we derived the following insights: (1) Annotation companies perform better both in terms of quantity and quality compared to the widely used platform MTurk. (2) Annotation companies’ internal QA only provides marginal improvements, if any. However, improving labeling instructions instead of investing in QA can substantially boost annotation performance. (3) The benefit of internal QA depends on specific image characteristics. Our work could enable researchers to derive substantially more value from a fixed annotation budget and change the way annotation companies conduct internal QA.

show abstract

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study

Cited by 11 publications

References 56 publications

Boosting wisdom of the crowd for medical image annotation using training performance and task features

Boosting wisdom of the crowd for medical image annotation using training performance and task features

Towards Expert-Amateur Collaboration: Prototypical Label Isolation Learning for Left Atrium Segmentation with Mixed-Quality Labels

Quality Assured: Rethinking Annotation Strategies in Imaging AI

Contact Info

Product

Resources

About