John Guiver scite author profile

This paper addresses the problem of extracting accurate labels from crowdsourced datasets, a key challenge in crowdsourcing. Prior work has focused on modeling the reliability of individual workers, for instance, by way of confusion matrices, and using these latent traits to estimate the true labels more accurately. However, this strategy becomes ineffective when there are too few labels per worker to reliably estimate their quality. To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices. We assume that each worker belongs to a certain community, where the worker's confusion matrix is similar to (a perturbation of) the community's confusion matrix. Our model can then learn a set of key latent features: (i) the confusion matrix of each community, (ii) the community membership of each user, and (iii) the aggregated label of each item. We compare the performance of our model against established aggregation methods on a number of large-scale, real-world crowdsourcing datasets. Our experimental results show that our CommunityBCC model consistently outperforms stateof-the-art label aggregation methods, gaining, on average, 8% more accuracy with the same amount of labels.

show abstract

Bayesian inference for Plackett-Luce ranking models

Guiver

Snelson

2009

120

View full text Add to dashboard Cite

Polynomial matrix primitive factorization over arbitrary coefficient field and related results

Guiver

Bose

1982

IEEE Trans. Circuits Syst.

View full text Add to dashboard Cite

Statistical models of music-listening sessions in social media

Zheleva

Guiver

Rodrigues

et al. 2010

View full text Add to dashboard Cite

User experience in social media involves rich interactions with the media content and other participants in the community. In order to support such communities, it is important to understand the factors that drive the users' engagement. In this paper we show how to define statistical models of different complexity to describe patterns of song listening in an online music community. First, we adapt the LDA model to capture music taste from listening activities across users and identify both the groups of songs associated with the specific taste and the groups of listeners who share the same taste. Second, we define a graphical model that takes into account listening sessions and captures the listening moods of users in the community. Our session model leads to groups of songs and groups of listeners with similar behavior across listening sessions and enables faster inference when compared to the LDA model. Our experiments with the data from an online media site demonstrate that the session model is better in terms of the perplexity compared to two other models: the LDA-based taste model that does not incorporate crosssession information and a baseline model that does not use latent groupings of songs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Guiver

Developmental Profiles of Eczema, Wheeze, and Rhinitis: Two Population-Based Birth Cohort Studies

Community-based bayesian aggregation models for crowdsourcing

Bayesian inference for Plackett-Luce ranking models

Polynomial matrix primitive factorization over arbitrary coefficient field and related results

Statistical models of music-listening sessions in social media

Contact Info

Product

Resources

About