AI ALIGNMENT FORUM
AF

AI
Personal Blog

9

[ Question ]

What is the most impressive game LLMs can play well?

by Cole Wyeth
8th Jan 2025
1 min read
A
0
20

9

AI
Personal Blog
What is the most impressive game LLMs can play well?
5Archimedes
2Vanessa Kosoy
3Vanessa Kosoy
1Cole Wyeth
3Vanessa Kosoy
4gwern
2Vanessa Kosoy
2gwern
New Answer
New Comment
8 comments, sorted by
top scoring
Click to highlight new comments since: Today at 4:14 PM
[-]Archimedes6mo50

Related question: What is the least impressive game current LLMs struggle with?

I’ve heard they’re pretty bad at Tic Tac Toe.

Reply1
[-]Vanessa Kosoy6mo21

Relevant link

Reply
[-]Vanessa Kosoy6mo30

Relevant: Manifold market about LLM chess

Reply
[-]Cole Wyeth6mo10

Interesting, the prices seemed reasonable overall though I traded the later dates down a little bit because if LLMs haven't won be 2030 the paradigm is probably limited (IMO they hadn't priced in that update). 

I suppose that it's a slightly "unfair" comparison because chess engines are very narrow and humans can't beat them either. How do LLMs compare to top human chess players?

Reply
[-]Vanessa Kosoy6mo30

Apparently someone let LLMs play against the random policy and for most of them, most games end in a draw. Seems like o1-preview is the best of those tested, managing to win 47% of the time.

Reply
[-]gwern6mo40

Given the other reports, like OA's own benchmarking (as well as the extremely large dataset of chess games they mention training on), I am skeptical of this claim, and wonder if this has the same issue as other 'random chess game' tests, where the 'random' part is not neutral but screws up the implied persona.

Reply
[-]Vanessa Kosoy6mo20

Do you mean that seeing the opponent make dumb moves makes the AI infer that its own moves are also supposed to be dumb, or something else?

Reply
[-]gwern6mo20

Yes.

Reply
Moderation Log
Curated and popular this week
A
0
8

Epistemic status: This is an off-the-cuff question.

~5 years ago there was a lot of exciting progress on game playing through reinforcement learning (RL). Now we have basically switched paradigms, pretraining massive LLMs on ~the internet and then apparently doing some really trivial unsophisticated RL on top of that - this is successful and highly popular because interacting with LLMs is pretty awesome (at least if you haven't done it before) and they "feel" a lot more like A.G.I. Probably there's somewhat more commercial use as well via code completion (and some would say many other tasks, personally not really convinced - generative image/video models will certainly be profitable though). There's also a sense in which they are clearly more general - e.g. one RL algorithm may learn many games but there's typically an instance per game not one integrated agent. You can just ask an LLM in context to play some games.

However, I've been following moderately closely and I can't seem to think of any examples where LLMs really pushed the state of the art in narrow game playing  - how much have LLMs contributed to RL research? For instance, will adding o3 to the stack easily stomp on previous Starcraft / go / chess agents?