AI ALIGNMENT FORUM
AF

Ryan Greenblatt
Ω4780355018
Message
Dialogue
Subscribe

I'm the chief scientist at Redwood Research.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No sequences to display.
5ryan_greenblatt's Shortform
2y
115
ryan_greenblatt's Shortform
ryan_greenblatt4d20
  • Anthropic reports SWE bench scores without reasoning which is some evidence it doesn't help (much) on this sort of task. (See e.g. the release blog post for 4 opus)
  • Anecdotal evidence

Probably it would be more accurate to say "doesn't seem to help much while it helps a lot for openai models".

Reply
ryan_greenblatt's Shortform
ryan_greenblatt5d190

Some people seem to think my timelines have shifted a bunch while they've only moderately changed.

Relative to my views at the start of 2025, my median (50th percentile) for AIs fully automating AI R&D was pushed back by around 2 years—from something like Jan 2032 to Jan 2034. My 25th percentile has shifted similarly (though perhaps more importantly) from maybe July 2028 to July 2030. Obviously, my numbers aren't fully precise and vary some over time. (E.g., I'm not sure I would have quoted these exact numbers for this exact milestone at the start of the year; these numbers for the start of the year are partially reverse engineered from this comment.)

Fully automating AI R&D is a pretty high milestone; my current numbers for something like "AIs accelerate AI R&D as much as what would happen if employees ran 10x faster (e.g. by ~fully automating research engineering and some other tasks)" are probably 50th percentile Jan 2032 and 25th percentile Jan 2029.[1]

I'm partially posting this so there is a record of my views; I think it's somewhat interesting to observe this over time. (That said, I don't want to anchor myself, which does seem like a serious downside. I should slide around a bunch and be somewhat incoherent if I'm updating as much as I should: my past views are always going to be somewhat obviously confused from the perspective of my current self.)

While I'm giving these numbers, note that I think Precise AGI timelines don't matter that much.


  1. See this comment for the numbers I would have given for this milestone at the start of the year. ↩︎

Reply
ryan_greenblatt's Shortform
ryan_greenblatt6d30

I've updated towards somewhat longer timelines again over the last 5 months. Maybe my 50th percentile for this milestone is now Jan 2032.

Reply
AIs will greatly change engineering in AI companies well before AGI
ryan_greenblatt6d40

Some AI company employees with shorter timelines than me mostly. I also think that "why I don't agree with X" is a good prompt to express some deeper aspect of my models/views. It also makes a good reasonably engaging hook for a blog post.

I might write some posts responding arguments for longer timelines that I disagree with if I feel like I have something interesting to say.

Reply
ryan_greenblatt's Shortform
ryan_greenblatt12d3870

Precise AGI timelines don't matter that much.

While I do spend some time discussing AGI timelines (and I've written some posts about it recently), I don't think moderate quantitative differences in AGI timelines matter that much for deciding what to do[1]. For instance, having a 15-year median rather than a 6-year median doesn't make that big of a difference. That said, I do think that moderate differences in the chance of very short timelines (i.e., less than 3 years) matter more: going from a 20% chance to a 50% chance of full AI R&D automation within 3 years should potentially make a substantial difference to strategy.[2]

Additionally, my guess is that the most productive way to engage with discussion around timelines is mostly to not care much about resolving disagreements, but then when there appears to be a large chance that timelines are very short (e.g., >25% in <2 years) it's worthwhile to try hard to argue for this.[3] I think takeoff speeds are much more important to argue about when making the case for AI risk.

I do think that having somewhat precise views is helpful for some people in doing relatively precise prioritization within people already working on safety, but this seems pretty niche.

Given that I don't think timelines are that important, why have I been writing about this topic? This is due to a mixture of: I find it relatively quick and easy to write about timelines, my commentary is relevant to the probability of very short timelines (which I do think is important as discussed above), a bunch of people seem interested in timelines regardless, and I do think timelines matter some.

Consider reflecting on whether you're overly fixated on details of timelines.


  1. I've seen Richard Ngo make this point before, though I couldn't find where he did this. More generally, this isn't a very original point; I just think it's worth making given that I've been talking about timelines recently. ↩︎

  2. I also think that the chance that very powerful AI happens under this presidential administration is action-relevant for policy. ↩︎

  3. You could have views such that you expect to never be >25% confident in <2-year timelines until it's basically too late. For instance, maybe you expect very fast takeoff driven by a single large algorithmic advance. Under this view, I think arguing about the details of timelines looks even less good and you should mostly make the case for risk independently of this, perhaps arguing "it seems like AI could emerge quickly and unexpectedly, so we need to act now". ↩︎

Reply411
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
ryan_greenblatt13d50
  • I expect lower cost per rollout on average due to AI companies doing RL on a bunch of smaller tasks and from companies not necessarily using tons of reasoning tokens on most envs. Also, API prices are marked up relative to what companies actually pay on compute which can easily add a factor of 3. If we are just looking at the hardest agentic software engineering environments, then this closes the gap a decent amount.
  • I expect spending on RL enviroments to be more like 10x lower than RL training compute rather than similar (and I wouldn't be surprised by a large gap) because it's hard to massively scale up spending on RL envs effectively in a short period of time while we already have an scaled up industrial process for buying more compute.

I'm more sympathetic to "companies will spend this much on some high quality RL envs" than "the typical RL env will be very expensive", but I think some disagreement remains.

Reply11
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
ryan_greenblatt13d*40

Thanks, I wasn't aware of this post. (I think it overstates the level of spending we'll see on the average RL env within a year by maybe 10x or more, but I agree directionally.)

Reply
Buck's Shortform
ryan_greenblatt23d110

And the whole exchange on nitter for those who don't like going on x/twitter.

Reply5
ryan_greenblatt's Shortform
ryan_greenblatt1mo40

I wrote an update here.

Reply
Towards Alignment Auditing as a Numbers-Go-Up Science
ryan_greenblatt1mo98

This seems pretty reasonable (strong upvoted).

These aren't disagreements with the post, but here are some of my thoughts (most of which Sam already knows, but I'm sharing for third parties):

My largest concern is that the auditing test beds will end up being pretty disanalogous to the cases of misalignment I find most concerning and thus hill climbing on these test beds isn't that useful. (You discuss the need for high quality environments in the appendix. I agree.) In practice, my understanding is that constructing good auditing test beds has been a huge pain, especially because current AI systems have trouble "keeping their shit together" (and generally are unreliable at concealing secret aims) which makes behavioral red-teaming very powerful. I don't expect behavioral red teaming to work for the AI systems which I'm most worried about.[1] To be clear, the issues I discuss in this paragraph generally apply to testing alignment auditing methods.

Another concern is that current AI systems (and future AIs which aren't much more capable) are pretty different than humans which could limit transfer in the conclusions, though this doesn't mean that hill climbing wouldn't have substantial transfer. E.g., maybe some tool helps AI agents uncover some issues with relatively less need for cognitive effort or agency or noticing something and this makes a huge difference to the bottom line due to AI capability limitations. But, humans would only be marginally helped by the tool because they could just apply more effort.


  1. There is a decent chance that very powerful AIs still have trouble keeping their shit together, but these AIs correspondingly are much easier to alignment audit (at least for hidden intentions) such that I also think that these worlds are less leveraged to work on. ↩︎

Reply2
Load More
24AIs will greatly change engineering in AI companies well before AGI
7d
2
57Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
13d
6
48Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)
20d
2
56My AGI timeline updates from GPT-5 (and 2025 so far)
1mo
1
47Recent Redwood Research project proposals
2mo
0
35Jankily controlling superintelligence
3mo
0
22Prefix cache untrusted monitors: a method to apply after you catch your AI
3mo
1
36AI safety techniques leveraging distillation
3mo
0
40When does training a model change its goals?
3mo
0
21OpenAI now has an RL API which is broadly accessible
3mo
0
Load More
Anthropic (org)
9mo
(+17/-146)
Frontier AI Companies
1y
Frontier AI Companies
1y
(+119/-44)
Deceptive Alignment
2y
(+15/-10)
Deceptive Alignment
2y
(+53)
Vote Strength
2y
(+35)
Holden Karnofsky
2y
(+151/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
2y
(+316/-20)