Interesting, the prices seemed reasonable overall though I traded the later dates down a little bit because if LLMs haven't won be 2030 the paradigm is probably limited (IMO they hadn't priced in that update).

I suppose that it's a slightly "unfair" comparison because chess engines are very narrow and humans can't beat them either. How do LLMs compare to top human chess players?

Reply

[-]Vanessa Kosoy10mo30

Apparently someone let LLMs play against the random policy and for most of them, most games end in a draw. Seems like o1-preview is the best of those tested, managing to win 47% of the time.

Reply

[-]gwern10mo40

Given the other reports, like OA's own benchmarking (as well as the extremely large dataset of chess games they mention training on), I am skeptical of this claim, and wonder if this has the same issue as other 'random chess game' tests, where the 'random' part is not neutral but screws up the implied persona.

Reply

[-]Vanessa Kosoy10mo20

Do you mean that seeing the opponent make dumb moves makes the AI infer that its own moves are also supposed to be dumb, or something else?

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

9

[ Question ]

What is the most impressive game LLMs can play well?

9