2023
DOI: 10.48550/arxiv.2302.13814
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)

Abstract: We study the performance of a commercially available large language model (LLM) known as ChatGPT on math word problems (MWPs) from the dataset DRAW-1K. To our knowledge, this is the first independent evaluation of ChatGPT. We found that ChatGPT's performance changes dramatically based on the requirement to show its work, failing 20% of the time when it provides work compared with 84% when it does not. Further several factors about MWPs relating to the number of unknowns and number of operations that lead to a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(14 citation statements)
references
References 14 publications
0
14
0
Order By: Relevance
“…and reported small shifts (most below 5%) in ChatGPT's performance on some common benchmarks. Other papers Shakarian et al, 2023) also reported shifts in specific problems. Monitoring model performance shifts is an emerging research area for machine-learning-as-a-service (MLaaS) more broadly.…”
Section: Related Workmentioning
confidence: 89%
“…and reported small shifts (most below 5%) in ChatGPT's performance on some common benchmarks. Other papers Shakarian et al, 2023) also reported shifts in specific problems. Monitoring model performance shifts is an emerging research area for machine-learning-as-a-service (MLaaS) more broadly.…”
Section: Related Workmentioning
confidence: 89%
“…Moreover, an instruction of "approximating the decimal place" was not properly comprehended by ChatGPT during the Japanese-to-English translation. As such, calculation problems are reported as one of the areas where LLMs still exhibit relatively low accuracy [24], indicating that calculation problems may be a relatively unsuitable field for current ChatGPT.…”
Section: Discussionmentioning
confidence: 99%
“…A recent study by Pelton and Pelton (2023) and Shakarian et al (2023) investigated ChatGPT's performance in mathematics and supporting teacher education in mathematics. Their findings suggested that ChatGPT's performance is highly influenced by the requirement to show its work.…”
Section: Role Of Artificial Intelligence In Mathematics Problem-solvingmentioning
confidence: 99%