BFS-Prover-V2

A SoTA step-level theorem prover built for real use in Lean

Ran Xin^*1 Zeyu Zheng^*1,2 Yanchen Nie^*1,3 Kun Yuan³ Xia Xiao¹

^* Equal contribution

¹ByteDance Seed ²Carnegie Mellon University ³Peking University

Interactive Theorem Proving Done Right

While whole-proof models dominate benchmarks, they tend to have less efficient inference, produce less natural proofs with deeply nested have statements, face challenges with advanced tactic usage, and offer limited interactivity. Step-level provers generate one tactic at a time, enabling fast, tactically expressive, and interactive theorem proving in Lean.

miniF2F

95.08%

on par w/ best whole-proof models

over previous best step-level prover

💻

Use on your own laptop

Real-time tactic suggestions in Lean
via Ollama and LLMLean

Live Demo in Lean

Watch BFS-Prover-V2 in action on a personal laptop with LLMLean integration in VS Code

Introduction

We introduce BFS-Prover-V2, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. Building upon the foundation of BFS-Prover-V1, our approach introduces significant improvements that enable sustained learning over extended training periods and efficient exploration of complex proof spaces.

BFS-Prover-V2 introduces novel solutions through: (1) Training-time scaling: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training; (2) Inference-time scaling: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time.

BFS-Prover-V2 achieves 95.08% and 41.4% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers.

Technical Overview

Scaling up Training: Multi-Stage Expert Iteration

Training Pipeline — Training-time scaling architecture

A central challenge in scaling expert iteration is managing vast quantities of synthetic data while avoiding diminishing returns and mode collapse. We introduce two key innovations:

Adaptive Tactic Filtering

Instead of coarse problem-level filtering, we adopt fine-grained tactic-level filtering guided by perplexity distribution. We filter out both low-perplexity tactics (overly simple, no learning signal) and high-perplexity tactics (noisy or suboptimal). This acts as automated curriculum learning, using the model's own uncertainty to identify valuable training data.

Periodic Retraining: Escaping Local Optima

When the model plateaus after several expert iterations, we perform a "soft reset": (1) Use the current best prover to re-synthesize and de-noise all past proofs; (2) Retrain a fresh model from scratch on the refined dataset. This temporarily drops performance but increases exploration capacity, allowing the model to surpass previous peaks.

Scaling up Inference: Planner-Enhanced Multi-Agent Search

Our hierarchical inference architecture mirrors how human mathematicians work: sketch a high-level plan, then fill in details. It consists of two key components:

Planner-Guided Search

The planner (a general-purpose reasoning LLM) decomposes the main theorem into a sequence of intermediate subgoals. The prover (our specialized model) solves subgoals sequentially via tree search. If a subgoal fails, the planner is re-queried to generate a revised plan that corrects the strategy.

Parallel Provers & Shared Subgoal Cache

Multiple prover agents work in parallel on a single subgoal at a time. Once the first agent finds a proof, all others immediately terminate to prevent redundant computation. Proven subgoals are cached and used as known facts in subsequent proofs.

Benchmark Performance

Robust and generalizable performance across domains

Model	miniF2F-test	miniF2F-valid	ProofNet-test
BFS-Prover-V2-32B w/ Planner	95.08%	95.5%	—
BFS-Prover-V2-32B	86.1%	85.5%	41.4%
BFS-Prover-V2-7B	82.4%	—	—

Benchmark performance of BFS-Prover-V2 series

Getting Started

Use BFS-Prover-V2 in Lean via LLMLean

1. Install Ollama & Pull Model

Download Ollama

BFS-Prover-V2-7B BFS-Prover-V2-32B

Example using 7B model (q8_0 quantization):

ollama pull zeyu-zheng/BFS-Prover-V2-7B:q8_0

2. Add LLMLean to Lean Project

-- lakefile.lean
require llmlean from git
  "https://github.com/zeyu-zheng/llmlean.git"
  @ "bfs-pr"

# or lakefile.toml
[[require]]
name = "llmlean"
git = "https://github.com/zeyu-zheng/llmlean.git"
rev = "bfs-pr"

3. Configure LLMLean

Example using 7B model (q8_0 quantization):

# ~/.config/llmlean/config.toml
api = "ollama"
model = "zeyu-zheng/BFS-Prover-V2-7B:q8_0"
mode = "parallel"
prompt = "tacticstate"
responseFormat = "tactic"
numSamples = "5"

4. Use in Lean

import Mathlib
import LLMlean

example : ... := by
  llmstep ""  -- Get suggestions for next tactic
  -- or
  llmstep "rw"  -- Get suggestions for using `rw` next

Scroll horizontally to see all steps

Citation

@article{xin2025scaling,
  title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
  author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
  journal={arXiv preprint arXiv:2509.06493},
  year={2025}
}