When crowdsourcing the creation of machine learning datasets, statistical distributions that capture diverse answers can represent ambiguous data better than a single best answer. Unfortunately, collecting distributions is expensive because a large number of responses need to be collected to form a stable distribution. Despite this, the efficient collection of answer distributions-that is, ways to use less human effort to collect estimates of the eventual distribution that would be formed by a large group of responses-is an under-studied topic. In this paper, we demonstrate that this type of estimation is possible and characterize different elicitation approaches to guide the development of future systems. We investigate eight elicitation approaches along two dimensions: annotation granularity and estimation perspective. Annotation granularity is varied by annotating i) a single "best" label, ii) all relevant labels, iii) a ranking of all relevant labels, or iv) real-valued weights for all relevant labels. Estimation perspective is varied by prompting workers to either respond with their own answer or an estimate of the answer(s) that they expect other workers would provide. Our study collected ordinal annotations on the emotional valence of facial images from 1,960 crowd workers and found that, surprisingly, the most fine-grained elicitation methods were not the most accurate, despite workers spending more time to provide answers. Instead, the most efficient approach was to ask workers to choose all relevant classes that others would have selected. This resulted in a 21.4% reduction in the human time required to reach the same performance as the baseline (i.e., selecting a single answer with their own perspective). By analyzing cases in which finer-grained annotations degraded performance, we contribute to a better understanding of the trade-offs between answer elicitation approaches. Our work makes it more tractable to use answer distributions in large-scale tasks such as ML training, and aims to spark future work on techniques that can efficiently estimate answer distributions.
We consider the design of prediction market mechanisms known as automated market makers. We show that we can design these mechanisms via the mold of exponential family distributions, a popular and well-studied probability distribution template used in statistics. We give a full development of this relationship and explore a range of benefits. We draw connections between the information aggregation of market prices and the belief aggregation of learning agents that rely on exponential family distributions. We develop a natural analysis of the market behavior as well as the price equilibrium under the assumption that the traders exhibit risk aversion according to exponential utility. We also consider similar aspects under alternative models, such as budget-constrained traders.
Undergraduates are unlikely to even consider graduate research in Computer Science if they do not know what Computer Science research is. Many programs aimed at introducing undergraduate to research are structured like graduate research programs, with a small number of undergraduates working with a faculty advisor. Further, females, under-represented minorities, and first generation students may be too intimidated or the idea of research may be too amorphous, so that they miss out on these programs. As a consequence, we lose out on opportunities for greater diversity in CS research.We have started a pilot program in our department where a larger number of students (close to two dozen) work with a single faculty member as part of a research group focused on Machine Learning and related areas. The goal of this program is not to convince students to pursue a research career but rather to enable them to make a more informed decision about what role they would like research to play in their future.In order to evaluate our approach, we elicited student experience via two anonymized exit surveys. Students report that they develop a better understanding of what research in Computer Science is. Their interest in research was increased as was their reported confidence in their ability to do research, although not all students wanted to further pursue computer science research opportunities. Given the reported experience of female students, this program can offer a starting point for greater diversity in CS research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.