2021
DOI: 10.48550/arxiv.2109.00110
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Abstract: We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f [12], a neural theorem prover based on and pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…After each stage of fine-tuning, we evaluate the neural theorem prover on miniF2F [54]. The results are shown in Table 3.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…After each stage of fine-tuning, we evaluate the neural theorem prover on miniF2F [54]. The results are shown in Table 3.…”
Section: Resultsmentioning
confidence: 99%
“…MiniF2F [54] is a recently introduced benchmark containing 488 mathematical competition statements manually formalized by humans in three different formal languages. Its goal is to compare and benchmark methods across different theorem provers for machine learning research.…”
Section: Mathematical Competition Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…In Polu et al [18], the model is fine-tuned on theorems from the training set and expert iteration is done on theorems from different sources: train theorems, synthetic statements, and an extra curriculum of statements without proofs (miniF2F-curriculum). The produced model is then evaluated on unseen statements, namely the validation and test splits of the miniF2F dataset [8].…”
Section: Evaluation Settings and Protocolmentioning
confidence: 99%
“…• State-of-the-art performance on all analyzed environments. In particular, our model manages to prove over 82.6% of proofs in a held-out set of theorems from set.mm in Metamath, as well as 58.6% on miniF2F-valid [8] in Lean.…”
Section: Introductionmentioning
confidence: 99%