Thomas Larsen — AI Alignment Forum

AI 2027: What Superintelligence Looks Like

by Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, and romeo

In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026. Well, it's finally time. I'm back, and this time I have a team...

Apr 3, 2025696

Challenge: construct a Gradient Hacker

This is a relatively clean subproblem that we came upon a few months ago while thinking about gradient hacking. We're throwing it out to the world to see if anyone can make progress. Problem: Construct a gradient hacker (definition below), or prove that one cannot exist under the given conditions....

Mar 9, 202342

Thomas Larsen's Shortform

Nov 8, 20226

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Epistemic Status: My best guess Epistemic Effort: ~75 hours of work put into this document Contributions: Thomas wrote ~85% of this, Eli wrote ~15% and helped edit + structure it. Unless specified otherwise, writing in the first person is by Thomas and so are the opinions. Thanks to Miranda Zhang,...

Aug 29, 2022415

Finding Goals in the World Model

by Jeremy Gillen, JamesH, and Thomas Larsen

Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction This post works off the assumption that the first AGI comes relatively soon, and has an architecture which looks basically like EfficientZero, with a few improvements: a significantly larger world model and a significantly...

Aug 22, 202259

The Core of the Alignment Problem is...

Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction When trying to tackle a hard problem, a generally effective opening tactic is to Hold Off On Proposing Solutions: to fully discuss a problem and the different facets and aspects of it. This is...

Aug 17, 202276