Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making

Guerdan, Luke; Coston, Amanda; Wu, Zhiwei Steven; Holstein, Kenneth

doi:10.48550/arxiv.2302.06503

Cited by 1 publication

(3 citation statements)

References 74 publications

(190 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead they are inferred indirectly via proxies: measurements of properties that are observed in the data available to a model. The process of defining proxy variables for a construct of interest necessarily involves making simplifying assumptions, and there is often a considerable conceptual distance between ML proxies and the ways human decision-makers think about the targeted construct (Green and Chen 2021;Guerdan et al 2023;Jacobs and Wallach 2021;Kawakami et al 2022). In other words, O H (X, a) ̸ = O M (X, a).…”

Section: Task Definitionmentioning

confidence: 99%

“…Much prior work has studied settings where the ML model outperforms the human decision-maker. These studies are frequently focused on tasks where there are no reasons to expect upfront that the human and the ML model will have complementary strengths (Bansal et al 2021;Guerdan et al 2023;Holstein and Aleven 2021;Lurie and Mulligan 2020). For example, some experimental studies employ untrained crowdworkers on tasks that require extensive domain expertise, without which there is no reason to expect that novices would have complementary strengths (Fogliato, Chouldechova, and Lipton 2021;Lurie and Mulligan 2020;Rastogi et al 2022).…”

Section: Introductionmentioning

confidence: 99%

“…For example, some experimental studies employ untrained crowdworkers on tasks that require extensive domain expertise, without which there is no reason to expect that novices would have complementary strengths (Fogliato, Chouldechova, and Lipton 2021;Lurie and Mulligan 2020;Rastogi et al 2022). Other experimental studies are designed in ways that artificially constrain human performance-for instance, by eliminating the possibility that humans and ML systems have access to complementary information (Guerdan et al 2023). Meanwhile studies on human-ML decisionmaking in real-world settings such as healthcare (Tschandl et al 2020;Patel et al 2019) sometimes demonstrate better human-ML team performance than either agent alone.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Taxonomy of Human and ML Strengths in Decision-Making to Investigate Human-ML Complementarity

Rastogi,

Leqi,

Holstein

et al. 2023

HCOMP

View full text Add to dashboard Cite

Hybrid human-ML systems increasingly make consequential decisions in a wide range of domains. These systems are often introduced with the expectation that the combined human-ML system will achieve complementary performance, that is, the combined decision-making system will be an improvement compared with either decision-making agent in isolation. However, empirical results have been mixed, and existing research rarely articulates the sources and mechanisms by which complementary performance is expected to arise. Our goal in this work is to provide conceptual tools to advance the way researchers reason and communicate about human-ML complementarity. Drawing upon prior literature in human psychology, machine learning, and human-computer interaction, we propose a taxonomy characterizing distinct ways in which human and ML-based decision-making can differ. In doing so, we conceptually map potential mechanisms by which combining human and ML decision-making may yield complementary performance, developing a language for the research community to reason about design of hybrid systems in any decision-making domain. To illustrate how our taxonomy can be used to investigate complementarity, we provide a mathematical aggregation framework to examine enabling conditions for complementarity. Through synthetic simulations, we demonstrate how this framework can be used to explore specific aspects of our taxonomy and shed light on the optimal mechanisms for combining human-ML judgments.

show abstract

Section: Task Definitionmentioning

confidence: 99%