AI Struggles to Translate Messy Math Proofs into Formal Code

Researchers tested AI tools that turn human-written math proofs into computer code. They found these tools work well with clean, ideal proofs but fail with real-world, messy ones. This highlights a big gap in AI's ability to robustly handle informal mathematics.

Researchers released a study analyzing the robustness of AI tools that convert human-written math proofs into formal code for systems like Lean 4. These tools, which use large language models, have shown promise when given well-formed, idealized informal proofs from curated datasets. However, the study found that when faced with real-world, messy proofs that deviate from these idealized examples — the kind mathematicians actually write — the AI tools often fail to produce accurate formalizations. The authors argue that a truly robust proof autoformalizer must remain faithful even for informal proofs that are not perfectly structured, and this work presents the first systematic study of that robustness gap.

This matters because it shows how far AI still has to go in understanding and working with human knowledge. Imagine trying to follow someone's handwritten notes — it's easy if they're neat, but confusing if they're scribbled. These AI tools are like that, but for math proofs. They need to get better at handling the messy, real-world versions of math we actually use.

If you're curious, you can explore Lean 4 yourself at leanprover-community.github.io/lean4. Try writing a simple math proof and see how the system helps you formalize it. This will give you a taste of both the power and the limitations of current AI math tools.