Ryan Greenblatt

I'm the chief scientist at Redwood Research.

Sequences

Comments

Sorted by

This is late, but I'd like to note for the record that I'm unconvinced that the post's thesis is true (that acausal dynamics work out to normalcy) and I don't find myself at all moved by arguments made for this in the post.

It's unclear to me exactly what is meant by "normalcy", e.g. does it count as normalcy if people who take acausal dynamics seriously end up spending resources mostly on acausal interactions? I think this won't feel very normal from the outside.

To be clear, I agree it's quite uncertain. I put substantial chance on further open source models basically not mattering and some chance on them being very important.

I think people do use llama 70b and I predict people will increasingly use somewhat bigger models.

Also, it's worth noting that 70b models can seemingly be pretty close to the current frontier!

Alex Mallen also noted a connection with people generally thinking they are in race when they actually aren't: https://forum.effectivealtruism.org/posts/cXBznkfoPJAjacFoT/are-you-really-in-a-race-the-cautionary-tales-of-szilard-and

I've heard from a credible source that OpenAI substantially overestimated where other AI companies were at with respect to RL and reasoning when they released o1. Employees at OpenAI believed that other top AI companies had already figured out similar things when they actually hadn't and were substantially behind. OpenAI had been sitting the improvements driving o1 for a while prior to releasing it. Correspondingly, releasing o1 resulted in much larger capabilities externalities than OpenAI expected. I think there was one more case like this either from OpenAI or GDM where employees had a large misimpression about capabilities progress at other companies causing a release they wouldn't do otherwise.

One key takeaway from this is that employees at AI companies might be very bad at predicting the situation at other AI companies (likely making coordination more difficult by default). This includes potentially thinking they are in a close race when they actually aren't. Another update is that keeping secrets about something like reasoning models worked surprisingly well to prevent other companies from copying OpenAI's work even though there was a bunch of public reporting (and presumably many rumors) about this.

One more update is that OpenAI employees might unintentionally accelerate capabilities progress at other actors via overestimating how close they are. My vague understanding was that they haven't updated much, but I'm unsure. (Consider updating more if you're an OpenAI employee!)

RL sample efficiency can be improved by both:

  • Better RL algorithms (including things that also improve pretraining sample efficiency like better architectures and optimizers).
  • Smarter models (in particular, smarter base models, though I'd also expect that RL in one domain makes RL in some other domain more sample efficient, at least after sufficient scale).

There isn't great evidence that we've been seeing substantial improvements in RL algorithms recently, but AI companies are strongly incentivized to improve RL sample efficiency (as RL scaling appears to be yielding large returns) and there are a bunch of ML papers which claim to find substantial RL improvements (though it's hard to be confident in these results for various reasons). So, we should probably infer that AI companies have made substantial gains in RL algorithms, but we don't have public numbers. 3.7 sonnet was much better than 3.5 sonnet, but it's hard to know how much of the gain was from RL algorithms vs from other areas.

Minimally, there is a long running trend in pretraining algorithmic efficiency and many of these improvements should also transfer somewhat to RL sample efficiency.

As far as evidence that smarter models learn more sample efficiently, I think the deepseek R1 paper has some results on this. Probably it's also possible to find various pieces of support for this in the literature, but I'm more familiar with various anecdotes.

A response to Dwarkesh's post arguing continual learning is a bottleneck.

This is a response to Dwarkesh's post "Why I have slightly longer timelines than some of my guests". I originally posted this response on twitter here.

I agree with much of this post. I also have roughly 2032 medians to things going crazy, I agree learning on the job is very useful, and I'm also skeptical we'd see massive white collar automation without further AI progress.

However, I think Dwarkesh is wrong to suggest that RL fine-tuning can't be qualitatively similar to how humans learn. In the post, he discusses AIs constructing verifiable RL environments for themselves based on human feedback and then argues this wouldn't be flexible and powerful enough to work, but RL could be used more similarly to how humans learn.

My best guess is that the way humans learn on the job is mostly by noticing when something went well (or poorly) and then sample efficiently updating (with their brain doing something analogous to an RL update). In some cases, this is based on external feedback (e.g. from a coworker) and in some cases it's based on self-verification: the person just looking at the outcome of their actions and then determining if it went well or poorly.

So, you could imagine RL'ing an AI based on both external feedback and self-verification like this. And, this would be a "deliberate, adaptive process" like human learning. Why would this currently work worse than human learning?

Current AIs are worse than humans at two things which makes RL (quantitatively) much worse for them:

  1. Robust self-verification: the ability to correctly determine when you've done something well/poorly in a way which is robust to you optimizing against it.
  2. Sample efficiency: how much you learn from each update (potentially leveraging stuff like determining what caused things to go well/poorly which humans certainly take advantage of). This is especially important if you have sparse external feedback.

But, these are more like quantitative than qualitative issues IMO. AIs (and RL methods) are improving at both of these.

All that said, I think it's very plausible that the route to better continual learning routes more through building on in-context learning (perhaps through something like neuralese, though this would greatly increase misalignment risks...).

Some more quibbles:

  • For the exact podcasting tasks Dwarkesh mentions, it really seems like simple fine-tuning mixed with a bit of RL would solve his problem. So, an automated training loop run by the AI could probably work here. This just isn't deployed as an easy-to-use feature.
  • For many (IMO most) useful tasks, AIs are limited by something other than "learning on the job". At autonomous software engineering, they fail to match humans with 3 hours of time and they are typically limited by being bad agents or by being generally dumb/confused. To be clear, it seems totally plausible that for podcasting tasks Dwarkesh mentions, learning is the limiting factor.
  • Correspondingly, I'd guess the reason that we don't see people trying more complex RL based continual learning in normal deployments is that there is lower hanging fruit elsewhere and typically something else is the main blocker. I agree that if you had human level sample efficiency in learning this would immediately yield strong results (e.g., you'd have very superhuman AIs with 10^26 FLOP presumably), I'm just making a claim about more incremental progress.
  • I think Dwarkesh uses the term "intelligence" somewhat atypically when he says "The reason humans are so useful is not mainly their raw intelligence. It's their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task." I think people often consider how fast someone learns on the job as one aspect of intelligence. I agree there is a difference between short feedback loop intelligence (e.g. IQ tests) and long feedback loop intelligence and they are quite correlated in humans (while AIs tend to be relatively worse at long feedback loop intelligence).
  • Dwarkesh notes "An AI that is capable of online learning might functionally become a superintelligence quite rapidly, even if there's no algorithmic progress after that point." This seems reasonable, but it's worth noting that if sample efficient learning is very compute expensive, then this might not happen so rapidly.
  • I think AIs will likely overcome poor sample efficiency to achieve a very high level of performance using a bunch of tricks (e.g. constructing a bunch of RL environments, using a ton of compute to learn when feedback is scarce, learning from much more data than humans due to "learn once deploy many" style strategies). I think we'll probably see fully automated AI R&D prior to matching top human sample efficiency at learning on the job. Notably, if you do match top human sample efficiency at learning (while still using a similar amount of compute to the human brain), then we already have enough compute for this to basically immediately result in vastly superhuman AIs (human lifetime compute is maybe 3e23 FLOP and we'll soon be doing 1e27 FLOP training runs). So, either sample efficiency must be worse or at least it must not be possible to match human sample efficiency without spending more compute per data-point/trajectory/episode.

Note: as a general policy I'm not planning on engaging with the comments here, this is just because I don't want to spend a bunch of time on this topic and this could easily eat a bunch of time. Sorry about that.

I think you're saying that LWers (of which I am one) have a very low opinion of the integrity of people at Anthropic

This is related to what I was saying but it wasn't what I was saying. I was saying "tend to be overly pessimistic about Anthropic leadership (in terms of how good of decisions Anthropic leadership will make under the LessWrong person's views and values)". I wasn't making a claim about the perceived absolute level of integrity.

Probably not worth hashing this out further, I think I get what you're saying.

How good/bad is it to work on capabilities at Anthropic?

That's the most clear cut case, but lots of stuff trades off anthropic power with other stuff.

Load More