[Published today by DeepMind]

Our AI system surpasses the state-of-the-art approach for geometry problems, advancing AI reasoning in mathematics

Reflecting the Olympic spirit of ancient Greece, the International Mathematical Olympiad is a modern-day arena for the world's brightest high-school mathematicians. The competition not only showcases young talent, but has emerged as a testing ground for advanced AI systems in math and reasoning.

In a paper published today in Nature, we introduce AlphaGeometry, an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist - a breakthrough in AI performance. In a benchmarking test of 30 Olympiad geometry problems, AlphaGeometry solved 25 within the standard Olympiad time limit. For comparison, the previous state-of-the-art system solved 10 of these geometry problems, and the average human gold medalist solved 25.9 problems.

Links to the paper appear broken, but here is a link:

https://www.nature.com/articles/s41586-023-06747-5

Interesting that the transformer used is tiny. From the paper:

We use the Meliad library for transformer training with its base settings. The transformer has 12 layers, embedding dimension of 1,024, eight heads of attention and an inter-attention dense layer of dimension 4,096 with ReLU activation. Overall, the transformer has 151 million parameters, excluding embedding layers at its input and output heads. Our customized tokenizer is trained with ‘word’ mode using SentencePiece and has a vocabulary size of 757. We limit the maximum context length to 1,024 tokens and use T5-style relative position embedding. Sequence packing is also used because more than 90% of our sequences are under 200 in length.

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 10:03 PM

Another interesting fact is that without any NN, but using the rest of approach from the paper, their method gets 18/30 correct. The NN boosts to 25/30. The prior SOTA was 10/30 (also without a NN).

So arguably about 1/2 of the action is just improvements in the non-AI components.

Hmm it might be questionable to suggest that it is "non-AI" though? It's based on symbolic and algebraic deduction engines and afaict it sounds like it might be the sort of thing that used to be very much mainstream "AI" i.e. symbolic AI + some hard-coded human heuristics?

Sure, just seems like a very non-central example of AI from the typical perspective of LW readers.

Interesting - how does the non AI portion work?

This is another example of how matching specialized human reasoning skill seems routinely feasible with search guided by 100M scale networks trained for a task a human would spend years mastering. These tasks seem specialized, but it's plausible all breadth of human activity can be covered with a reasonable number of such areas of specialization. What's currently missing is automation of formulation and training of systems specialized in any given skill.

The often touted surprisingly good human sample efficiency might just mean that when training is set up correctly, it's sufficient to train models of size comparable to the amount of external text data that a human might need to master a skill, rather than models of the size comparable to a human brain. This doesn't currently work for training systems that autonomously do research and produce new AI experiments and papers, and in practice the technology might take a very different route. But once it does work, surpassing human performance might fail to require millions of GPUs even for training.

[-]O O3mo20

Is the imo grand challenge plausibly solved or solved soon?

I think geometry problems are well known to be very easy for machines relative to humans. See e.g. here. So this doesn't seem like most of the difficulty.

Though perhaps it is a subset of humans:

Chapter 8: Geometry for Americans

We call this chapter "Geometry for Americans" instead of "Geometry for Dummies" so as not to offend. The sad truth is that most mathematically inclined Americans know very little geometry, in contrast to their luckier peers in Eastern Europe and Asia. But it is never too late to learn. Geometry is a particularly fun topic to study, because you are compelled to draw lots and lots of pictures.

(30% serious here)

https://arxiv.org/abs/2404.06405

"Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist."