We propose a new approach to automated theorem proving and deductive program synthesis where an AlphaZero-style agent is self-training to refine a high-level expert strategy expressed as a nondeterministic program. An analogous teacher agent is self-training to generate tasks of suitable relevance and difficulty for the learner. This allows leveraging minimal amounts of domain knowledge to tackle problems for which training data is unavailable or hard to synthesize. We illustrate our approach on the problem of loop invariant synthesis for imperative programs and using neural networks to refine both the teacher and solver strategies.Preprint. Under review.