We focus on the problem of obtaining top-k lists of items from larger itemsets, using human workers for doing comparisons among items.An example application is short-listing a large set of college applications using advanced students as workers. We describe novel efficient techniques and explore their tolerance to adversarial behavior and the tradeoffs among different measures of performance (latency, expense and quality of results). We empirically evaluate the proposed techniques against prior art using simulations as well as real crowds in Amazon Mechanical Turk. A randomized variant of the proposed algorithms achieves significant budget saves, especially for very large itemsets and large top-k lists, with negligible risk of lowering the quality of the output.
We propose novel algorithms for the problem of crowdsourcing binary labels. Such binary labeling tasks are very common in crowdsourcing platforms, for instance, to judge the appropriateness of web content or to flag vandalism. We propose two unsupervised algorithms: one simple to implement albeit derived heuristically, and one based on iterated bayesian parameter estimation of user reputation models. We provide mathematical insight into the benefits of the proposed algorithms over existing approaches, and we confirm these insights by showing that both algorithms offer improved performance on many occasions across both synthetic and real-world datasets obtained via Amazon Mechanical Turk.
We consider crowdsourcing problems where the workers are asked to provide evaluations for items; the worker evaluations are then used to estimate the true quality of items. Lacking an incentive scheme, workers have no motive in making effort in completing the evaluations, providing inaccurate answers instead. We show that a simple approach of providing incentives by assessing randomly chosen workers is not scalable: to guarantee an incentive to be truthful the number of workers that the supervisor needs to assess grows linearly with total number of workers. To address the scalability problem, we propose incentive schemes that are truthful and cheap: the truthful as the optimal worker behavior consists in providing accurate evaluations, and cheap because the truthfulness is achieved with little supervision cost. We consider both discrete evaluation tasks, where an evaluation can be done either correctly, or incorrectly, with no degrees of approximation in between, and quantitative evaluation tasks, where evaluations are real numbers, and the error is measured as distance from the correct value. For both types of tasks, we develop hierarchical incentive schemes that can be effected with a small amount of supervised evaluations, and that scale to arbitrarily large crowd sizes: they have the property that the strength of the incentive does not weaken with increasing hierarchy depth. We show that the proposed hierarchical schemes are robust: they provide incentives in heterogeneous environments where workers can have limited proficiencies, as long as there are enough proficient workers in the crowd. Interestingly, we also show that for these schemes to work, the only requisite is that workers know their place in the hierarchy in advance.
We propose techniques that obtain top-k lists of items out of larger itemsets, using human workers to perform comparisons among items. An example application is to short-list a large set of college applications using advanced students as workers. A method that obtains crowdsourced top-k lists has to address several challenges of crowdsourcing: there are constraints in the total number of tasks due to monetary or practical reasons; tasks posted to workers have an inherent limitation on their size; obtaining results from human workers has high latency; workers may disagree on their judgments for the same items or provide wrong results on purpose; and, there can be varying difficulty among tasks of the same size. We describe novel efficient techniques and explore their tolerance to adversarial behavior and the tradeoffs among different measures of performance (latency, expense and quality of results). We empirically evaluate the proposed techniques using simulations as well as real crowds in Amazon Mechanical Turk. A randomized variant of the proposed algorithms achieves significant budget saves, especially for very large itemsets and large top-k lists, with negligible risk of lowering the quality of the output.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.