DOI: 10.29007/7jmg
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning for Synthesizing Functions in Higher-Order Logic

Abstract: The paper describes a deep reinforcement learning framework based on self-supervised learning within the proof assistant HOL4. A close interaction between the machine learning modules and the HOL4 library is achieved by the choice of tree neural networks (TNNs) as machine learning models and the internal use of HOL4 terms to represent tree structures of TNNs. Recursive improvement is possible when a task is expressed as a search problem. In this case, a Monte Carlo Tree Search (MCTS) algorithm guided by a TNN … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 23 publications
0
8
0
Order By: Relevance
“…These estimates are to be used as rewards in our improved MCTS algorithm. We choose a TNN as our machine learning model because it performs well on arithmetic and propositional formulas [7] as well as on Diophantine equations and combinators [6]. In our TNN, each HOL4 operator of arity a has a neural network associated with it modeling a function from R a×d to R d , where d is a globally fixed embedding size.…”
Section: Learning Provabilitymentioning
confidence: 99%
“…These estimates are to be used as rewards in our improved MCTS algorithm. We choose a TNN as our machine learning model because it performs well on arithmetic and propositional formulas [7] as well as on Diophantine equations and combinators [6]. In our TNN, each HOL4 operator of arity a has a neural network associated with it modeling a function from R a×d to R d , where d is a globally fixed embedding size.…”
Section: Learning Provabilitymentioning
confidence: 99%
“…Some efforts were however done to reconstruct a formula tree. Gauthier [Gau20] trained a tree network to construct a new tree, by choosing one symbol at a time, in a manner similar to sequence-to-sequence models. Here, the network was given the input tree, and the partially constructed output tree and tasked with predicting the next output symbol in a way similar to Tree2Tree models [CAR18].…”
Section: Related Workmentioning
confidence: 99%
“…The results are better than those we are able to get here, but no new logics or problems are tried and generalization and transfer have been very limited so far. The AlphaZero algorithm has also been applied in theorem proving to the synthesis of formulas (Brown and Gauthier 2019) and functions (Gauthier 2020).…”
Section: Introductionmentioning
confidence: 99%