In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026. Well, it's finally time. I'm back, and this time I have a team...
This is a relatively clean subproblem that we came upon a few months ago while thinking about gradient hacking. We're throwing it out to the world to see if anyone can make progress. Problem: Construct a gradient hacker (definition below), or prove that one cannot exist under the given conditions....
Epistemic Status: My best guess Epistemic Effort: ~75 hours of work put into this document Contributions: Thomas wrote ~85% of this, Eli wrote ~15% and helped edit + structure it. Unless specified otherwise, writing in the first person is by Thomas and so are the opinions. Thanks to Miranda Zhang,...
Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction This post works off the assumption that the first AGI comes relatively soon, and has an architecture which looks basically like EfficientZero, with a few improvements: a significantly larger world model and a significantly...
Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction When trying to tackle a hard problem, a generally effective opening tactic is to Hold Off On Proposing Solutions: to fully discuss a problem and the different facets and aspects of it. This is...