In a piece I wrote a few months ago, I argued that research mathematics had become an unexpectedly useful test case for AI, precisely because mathematical claims are either right or wrong, which makes AI outputs verifiable in a way that many business applications are not. That argument just got a very concrete test case. An internal OpenAI model did not merely help around the edges of a research workflow. It found the core construction behind a disproof of a famous 80-year-old conjecture, in a problem many experts expected to go the other way. The result is not just a milestone for AI. It is a concrete example of what happens when a system is willing to explore a low-probability path that the expert community had largely written off.
Regular reader? Consider becoming a paid supporter 🙏
When AI Found the Counterexample
The Erdős unit distance problem is an 80-year-old question that sounds like something from a puzzle book: if I put n points on a flat sheet of paper, how many pairs of points can be exactly one unit apart? Erdős suspected the answer could grow only a little faster than n itself. That simple setup is why the problem is so famous. It is easy to explain, but hard enough that it drew decades of attention from serious mathematicians. And it is not just recreational math. Problems like this matter because they often expose unexpected links between fields, and this one did exactly that. The fact that it had resisted serious effort from serious people for eight decades, and was described by experts as “possibly the best known problem in combinatorial geometry” is exactly what made it a meaningful benchmark.
Enter AI. An internal OpenAI model disproved the conjecture, meaning it found a construction showing Erdős was wrong: in the asymptotic sense mathematicians care about, you can achieve more unit distances than his conjecture allowed. The model produced a complete mathematical proof in a single pass, which was then checked, digested, and modestly simplified by a group of nine mathematicians, including Timothy Gowers and Noga Alon, who co-authored the paper. The key idea was to stop treating the problem as a drawing exercise and turn it into an arithmetic construction problem: find a number system that secretly contains many equal-distance relationships, then translate that structure back into points in the plane. A team of human mathematicians then stepped in to verify the logic and polish the steps into a much cleaner, more readable write-up.
The AI Proof, Through Mathematicians’ Eyes
What struck me most about the reactions is that the mathematicians did not treat this as a stunt. Several said they would have accepted the result in a top journal without hesitation if a human had submitted it. A few noted that this was the first autonomous AI result they found genuinely interesting as mathematics, not merely as a signal about where AI is headed. That distinction matters. The result stands on its own merits. The fact that a machine produced it adds context, but does not change the underlying quality of the argument.
What made the result surprising was not just that AI solved a hard problem, but that it pushed against the community’s default intuition. Most experts expected Erdős’s conjecture to be true, which meant most human effort had gone into trying to prove it rather than break it. The model apparently did not share that prior. Several mathematicians pointed to a combination of factors that likely kept humans from finding this path: the field’s collective belief that the conjecture was correct, the fact that the most natural generalization of the original construction does not obviously improve the bounds, and the sheer navigational difficulty of the construction once you commit to it. The model’s apparent advantage was not creativity in any romantic sense. It was something more operational: broad technical recall, tolerance for awkward search paths, and a willingness to keep working in a direction that a human researcher would reasonably abandon. That pattern should be familiar to anyone building AI systems for enterprise work. Many hard problems also sit between specialties, carry weak early signals, and require combining tools that rarely live in the same team’s mental model.
The cautions in the reactions are also worth noting. The paper itself is a human-verified, human-improved version of the original AI output, which is a reminder that the model found the path but did not produce the final research artifact alone. Melanie Matchett Wood warns that one successful case should not obscure how often AI systems produce arguments that look plausible but are wrong, and that the mathematical community is not yet well-prepared for a world where convincing and correct are increasingly easy to confuse. There are also unsettled questions about attribution and norms: the proof draws on ideas from earlier work by several research groups, and it is not obvious how credit and consent should work when an AI system absorbs that work and uses it to produce a commercially significant result.Â
So I would not read this as a replacement story. I would read it as a preview of a sharper division of labor: machines expand the search space, while humans become more important as reviewers, editors, judges, and stewards of the norms that make knowledge reliable. That lesson travels well beyond mathematics.

