2024
DOI: 10.1073/pnas.2318124121
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating language models for mathematics through interactions

Katherine M. Collins,
Albert Q. Jiang,
Simon Frieder
et al.

Abstract: There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
references
References 31 publications
(18 reference statements)
0
0
0
Order By: Relevance