Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 2022
DOI: 10.18653/v1/2022.emnlp-main.392
|View full text |Cite
|
Sign up to set email alerts
|

LILA: A Unified Benchmark for Mathematical Reasoning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…NeuLR (Xu et al, 2023c) assesses deductive reasoning, inductive reasoning, and abductive reasoning, emphasizing LLMs' capabilities in these distinct reasoning directions. TabMWP (Lu et al, 2023b), LILA (Mishra et al, 2022a), and miniF2F v1 (Zheng et al, 2022) all scrutinize LLMs' reasoning prowess in mathematics. The TabMWP dataset requires LLMs to engage in table-based Q&A and mathematical reasoning based on provided text and table data.…”
Section: Reasoningmentioning
confidence: 99%
See 1 more Smart Citation
“…NeuLR (Xu et al, 2023c) assesses deductive reasoning, inductive reasoning, and abductive reasoning, emphasizing LLMs' capabilities in these distinct reasoning directions. TabMWP (Lu et al, 2023b), LILA (Mishra et al, 2022a), and miniF2F v1 (Zheng et al, 2022) all scrutinize LLMs' reasoning prowess in mathematics. The TabMWP dataset requires LLMs to engage in table-based Q&A and mathematical reasoning based on provided text and table data.…”
Section: Reasoningmentioning
confidence: 99%
“…Models are tasked with selecting the optimal option from two alternatives within predefined scenarios. • LILA (Mishra et al, 2022a). The LILA dataset evaluates LLMs' mathematical reasoning skills through 23 tasks across four dimensions.…”
Section: D5 Reasoningmentioning
confidence: 99%
“…Geometry3K (Lu et al, 2021) is a geometry problem-solving dataset that provides formal representations, but the dataset size is small and the problems do not require complex reasoning. AQuA (Ling et al, 2017), NumGLUE (Mishra et al, 2022b) and Lila (Mishra et al, 2022a) are large-scale datasets of various math problems. They have been used as benchmarks in solving math word problems and mathematical reasoning tasks, but we find that these datasets require only a few reasoning steps.…”
Section: Related Workmentioning
confidence: 99%
“…For example, chain of thought (Wei et al, 2022) and scratchpad (Nye et al, 2021) induce generation of explanations associated with a reasoning question. Similarly other methods induces specific reasoning structures such as question summarization (Kuznia et al, 2022), question decomposition (Patel et al, 2022), program generation (Mishra et al, 2022a;Chen et al, 2022;Gao et al, 2023b), etc. However, in a real world user traffic, queries can be diverse covering various reasoning structures.…”
Section: Reasoning and Planningmentioning
confidence: 99%