New Open-Source Framework Tackles 'Hard Mode' Automated Theorem Proving
Researchers introduce a framework for 'Hard Mode' theorem proving, where AI must discover answers independently. They release two expert-annotated benchmarks to advance this challenging research area.

Researchers have developed an open-source framework designed to push the boundaries of automated theorem proving (ATP) in Lean 4. The framework introduces 'Hard Mode,' a stricter setting where AI systems must independently discover the answer before constructing a formal proof. This contrasts with the more common 'Easy Mode,' where the answer is embedded in the problem statement.
The significance of this work lies in its potential to provide a more realistic assessment of AI capabilities in theorem proving. Current benchmarks often simplify the task, leading to overly optimistic estimates of model performance. By releasing two expert-annotated Hard Mode variants of widely-used ATP benchmarks, MiniF2F-Hard and FIMO-Hard, the researchers aim to foster more rigorous and practical research in the field.
The release of these benchmarks is expected to spur further innovation in AI-driven theorem proving. Researchers and developers can now test and refine their models in a more challenging and realistic environment. The open-source nature of the framework ensures that the community can collaborate and build upon this work, potentially leading to breakthroughs in automated reasoning and formal verification.