You are currently viewing Google claims a breakthrough in mathematics with AI models for solving proofs

Google claims a breakthrough in mathematics with AI models for solving proofs

Zoom in / Illustration provided by Google.

On Thursday, Google DeepMind announced that AI systems called AlphaProof and AlphaGeometry 2 had solved four out of six problems in this year’s International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medal. The tech giant claims this is the first time AI has achieved this level of performance in the prestigious math competition, but as usual in AI, the claims aren’t as clear cut as they seem.

Google says AlphaProof uses reinforcement learning to prove mathematical statements in a formal language called Lean. The system is trained by generating and verifying millions of pieces of evidence, gradually tackling more difficult problems. AlphaGeometry 2, meanwhile, is described as an improved version of Google’s previous geometry-solving AI mode, now powered by a Gemini-based language model trained on significantly more data.

According to Google, eminent mathematicians Sir Timothy Gowers and Dr. Joseph Myers evaluated the AI ​​model’s solutions using the official IMO rules. The company reports that its combined system earned 28 out of a possible 42 points, just short of the gold medal threshold of 29 points. That includes a perfect score on the competition’s toughest problem, which Google claims only five human contestants have solved this year.

A math competition unlike any other

The IMO, held annually since 1959, pits elite college mathematicians against extremely difficult problems in algebra, combinatorics, geometry, and number theory. Performance on IMO problems has become a recognized benchmark for evaluating the mathematical reasoning abilities of an AI system.

Google says AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 tackled the geometry question. The AI ​​model is said to have failed to solve the two combinatorial problems. The company claims its systems resolved one issue in minutes, while others took up to three days.

Google says it first translated the IMO problems into a formal mathematical language for their AI model to process. This step differs from the formal competition, where competitors work directly with the problem statements during two 4.5-hour sessions.

Google reported that before this year’s competition, AlphaGeometry 2 could solve 83 percent of historical IMO geometry problems from the past 25 years, compared to its predecessor’s 53 percent success rate. The company claims the new system solved this year’s geometry problem in 19 seconds after receiving the formalized version.

Limits

Despite Google’s claims, Sir Timothy Gowers offered a more nuanced perspective on Google’s DeepMind models in a thread posted on X. While acknowledging the achievement as “far beyond what automatic theorem provers could do before,” Gowers pointed out several key qualifications.

“The main qualification is that the program required much more time than the human competitors – for some of the problems over 60 hours – and of course a much faster processing speed than the poor old human brain,” Gowers wrote. “If human contestants were allowed that kind of time per problem, they would undoubtedly have a higher score.”

Gowers also noted that people manually translated the problems into the formal Lean language before the AI ​​model went live. He emphasized that while AI does the basic mathematical reasoning, this “autoformalization” step is done by humans.

Regarding the broader implications for mathematical research, Gowers expressed uncertainty. “Are we close to the point where mathematicians are redundant? It’s hard to say. I guess we are still a breakout or two away,” he wrote. He suggested that the system’s long processing times indicated that it had not “figured out the math”, but admitted that “there is obviously something interesting going on when it works”.

Even with these limitations, Gowers speculates that such AI systems could become valuable research tools. “So we may be close to having a program that will allow mathematicians to get answers to a wide range of questions, provided those questions are not too difficult – something that a person can do in a few hours.” It would be massively useful as a research tool, even if it is not itself capable of solving open problems.”

Leave a Reply