Learning to rank from a noisy crowd

Kumar, Abhimanu; Lease, Matthew

doi:10.1145/2009916.2010129

Cited by 13 publications

(12 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A variety of methods have been proposed to assess the quality of judgments from turkers. Kumar and Lease [23,24] presented a weighted voting method based on turkers’ accuracies, which can be estimated by taking the full set of labels into account. Jung and Lease [25] conducted a large-scale consensus study on relevant judgements between query/document pairs for Web search on the ClueWeb09 dataset [26].…”

Section: Introductionmentioning

confidence: 99%

Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

Zhai¹,

Lingren²,

Deléger³

et al. 2013

J Med Internet Res

View full text Add to dashboard Cite

BackgroundA high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora.ObjectiveBuilding upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora.MethodsTo build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations.ResultsThe agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task.ConclusionsThis study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release t...

show abstract

Section: Introductionmentioning

confidence: 99%

Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

Zhai¹,

Lingren²,

Deléger³

et al. 2013

J Med Internet Res

View full text Add to dashboard Cite

show abstract

“…However, this assumption is rarely satisfied in realworld datasets. The accuracy levels of different users are considered in (Kumar and Lease 2011), which assumes that each user is correct with a certain probability and studies the problem via simulation methods such as naive Bayes and majority voting. In their pioneering work, (Chen et al 2013) studied rank aggregation in a crowd-sourcing environment for pairwise comparisons, modeled via the BTL or TCV model, where noisy BTL comparisons are assumed to be further corrupted.…”

Section: Additional Related Workmentioning

confidence: 99%

“…We present a generalization of Thurstone's model, called the heterogeneous Thurstone model (HTM), which allows users with different noise levels, as well as a certain class of adversarial users. Unlike previous efforts on rank aggregation for heterogeneous populations such as (Chen et al 2013;Kumar and Lease 2011), the proposed model maintains the generality of Thurstone's framework and thus also extends its special cases such as BTL and PL models. We evaluate the performance of the method using simulated data for different noise distributions.…”

Section: Introductionmentioning

confidence: 99%

Rank Aggregation via Heterogeneous Thurstone Preference Models

Jin¹,

Xu²,

Gu³

et al. 2020

AAAI

View full text Add to dashboard Cite

We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. Under this framework, we also propose a rank aggregation algorithm based on alternating gradient descent to estimate the underlying item scores and accuracy levels of different users simultaneously from noisy pairwise comparisons. We theoretically prove that the proposed algorithm converges linearly up to a statistical error which matches that of the state-of-the-art method for the single-user BTL model. We evaluate the proposed HTM model and algorithm on both synthetic and real data, demonstrating that it outperforms existing methods.

show abstract

“…Previous studies highlighted the potential role of SM indicators to enhance the PMS function in this phase and, particularly, within competitive positioning: constant benchmarking with competitors, including for specific products or services; identification of market or sector trends on SM; simulation of acceptance of products or services through SM channels, suggesting that customers compare different prototypes on SM platforms (Mislove et al, 2010;Bradbury, 2011). It emerged that the main users of this information for planning activities are marketing, R&D and human resources (Leonardi and Barley, 2008;Kumar and Lease, 2011), as they have information about the market situation and customer expectation in real time.…”

Section: Sm Information Usementioning

confidence: 99%

Social media and performance measurement systems: towards a new model?

Sidorova

Arnaboldi

Radaelli

2016

International Journal of Productivity and Performance Management

View full text Add to dashboard Cite

Access to this document was granted through an Emerald subscription provided by emeraldsrm: 463825 [] For AuthorsIf you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service information about how to choose which publication to write for and submission guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information. About Emerald www.emeraldinsight.comEmerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online products and additional customer resources and services. Downloaded by Politecnico di Milano At 07:27 24 March 2016 (PT)Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. Design/methodology/approach -The framework of the research was constructed to cover the technical component of PMS (measurement methods and indicators) and the use of the information obtained from SM. Empirically, the study is based on a set of case studies in eight companies.Findings -The study findings offer a theoretical and empirical framework to evaluate PMS in the era of SM. It provides a classification of SM metrics, key performance indicators correlated to their use within different departments belonging to eight companies, highlighting the benefits and threats of SM information for PMS.Research limitations/implications -The limitation of this study is the diversity of industries included into the multiple-case study. The authors choose cases with the aim of providing a broader view on the impact of SM on PMS. However, the results show the dependency of use and type of measurement on certain industries, requiring future research focused on specific sectors or PMS aspects. Practical implications -The paper provides a map of SM information measurement methods and use, which allows companies to position themselves and examine PMS evolution. Originality/value -The results of the paper propose a holistic model, employing SM as a new variable in PMS.

show abstract

Learning to rank from a noisy crowd

Cited by 13 publications

References 5 publications

Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

Rank Aggregation via Heterogeneous Thurstone Preference Models

Social media and performance measurement systems: towards a new model?

Contact Info

Product

Resources

About