A Test for Evaluating Performance in Human-AI Systems

Malone, Thomas W.; Vaccaro, Michelle; Campero, Andres; Song, Jaeyoon; Wen, Haoran; Almaatouq, Abdullah

doi:10.21203/rs.3.rs-2787476/v1

Cited by 1 publication

(5 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…our last study, the tool did not show significant effects on performance. This finding is consistent with other work on LLMs [36,37] that also showed that improving performance synergistically can be difficult in various human-AI interaction scenarios [37]. Although GPT-3 produces impressively reasonable results in our studies, subjects remarked that the tool often made suggestions they already thought of.…”

Section: Discussionsupporting

confidence: 92%

“…However, when human creativity and judgment are required, AI systems will support humans and increase their productivity but are unlikely to replace them completely. Good examples for this are programming or writing assistants [24,1,36,37,22,3]. When there is no clear division of labor and full automation is not the aim human-AI interaction remains challenging [18,38], and guidelines are needed [39].…”

Section: Discussionmentioning

confidence: 99%

“…Even though many tasks require human creativity and judgment, AI systems have the potential to support humans to increase their productivity and improve the overall results of the human-AI team, e.g. with programming or writing assistants [24,1,36,37,22,3]. However, designing human-AI interaction for such scenarios is not easy [18,38], and updated guidelines for interaction and interface design are required [39].…”

Section: Related Workmentioning

confidence: 99%

“…However, designing human-AI interaction for such scenarios is not easy [18,38], and updated guidelines for interaction and interface design are required [39]. This might also be why such hybrid teams sometimes do not improve performance [37]. Furthermore, the best possible hybrid team performance is not necessarily achieved by combining the most accurate systems with its user but rather those that are complementary [40,41] and use them as decision support [42].…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, the best possible hybrid team performance is not necessarily achieved by combining the most accurate systems with its user but rather those that are complementary [40,41] and use them as decision support [42]. Therefore, how the interaction of a hybrid team needs to be designed, how the team's performance is evaluated, and how successful they are is case-specific and achieving beneficial team synergy can be difficult [37]. Overall, how to best achieve human-AI synergy remains an important open research question.…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Interacting with Large Language Models: A Case Study on AI-Aided Brainstorming for Guesstimation Problems

Salikutluk

Koert

Jäkel

2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

Designing cooperative AI-systems that do not automate tasks but rather aid human cognition is challenging and requires human-centered design approaches. Here, we introduce AI-aided brainstorming for solving guesstimation problems, i.e. estimating quantities from incomplete information, as a testbed for human-AI interaction with large language models (LLMs). In a think-aloud study, we found that humans decompose guesstimation questions into sub-questions and often replace them with semantically related ones. If they fail to brainstorm related questions, they often get stuck and do not find a solution. Therefore, to support this brainstorming process, we prompted a large language model (GPT-3) with successful replacements from our think-aloud data. In follow-up studies, we tested whether the availability of this tool improves participants’ answers. While the tool successfully produced human-like suggestions, participants were reluctant to use it. From our findings, we conclude that for human-AI interaction with LLMs to be successful AI-systems must complement rather than mimic a user’s associations.

show abstract

Section: Discussionsupporting

confidence: 92%