A breakdown of AI capability levels focused on AI R&D labor acceleration

ryan_greenblatt

In a variety of conversations about AI misalignment risks, I find that it is important to be able to clearly point at different levels of AI capability. My current favorite approach is to talk about how much the AI accelerates AI R&D^[1] labor.

I define acceleration of AI R&D labor by Y times as "the level of acceleration which is as useful (for making more powerful AIs) for an AI company as having its employees run Y times faster^[2] (when you allow the total inference compute budget for AI assistance to be equal to total salaries)". Importantly, a 5x AI R&D labor acceleration won't necessarily mean that research into making AI systems more powerful happens 5x faster, as this just refers to increasing the labor part of the production function, and compute might also be an important input.^[3] This doesn't include acceleration of hardware R&D (as a pragmatic simplification).

Further, when I talk about AIs that can accelerate AI R&D labor by some factor, that means after being given some reasonable amount of time for human integration (e.g., 6 months) and given broad usage (but keeping fine-tuning and elicitation fixed during this integration time).

Why might this be a good approach? Because ultimately what we're worried about is AIs which can greatly accelerate R&D in general, and AI R&D in particular is worth focusing on as it could yield much faster AI progress, quickly bringing us to much greater levels of capability.

Why not just talk about the overall acceleration of AI progress (i.e., increases in the rate of effective compute increases as discussed in the Anthropic RSP) rather than just the labor input into AI R&D? Because for most misalignment-related discussions, I'd prefer to talk about capability levels mostly independent of exogenous factors that determine how useful that level of capability actually ends up being (i.e., independent from the extent to which compute is a bottleneck to AI research or the fraction of progress driven by scaling up hardware rather than algorithms). Rather than talking about overall AI progress or software progress labor acceleration, we could talk about the overall acceleration of just AI software progress (just algorithms, not compute increases)^[4], but this just adds the potential for compute bottlenecks without much benefit in discussions related to technical measures of misalignment. AI R&D labor acceleration doesn't fully avoid exogenous factors, but it avoids many such factors while still getting at a relevant and specific task.

I'll compare this approach to several alternatives later.

So, now we can talk about levels of capability like "3x AI R&D labor AIs". I'll call such systems "3x AIs" as shorthand.

Beyond discussing AI R&D labor acceleration, I think it is often useful to talk about the point when human cognitive labor is totally obsolete. Thus, I think it also makes sense to separately talk about Top-human-Expert-Dominating AI (TEDAI): AIs which strictly dominate top human experts^[5] in virtually all cognitive tasks (i.e., doable via remote work) while being at least 2x faster^[6] and within a factor of 5 on cost^[7]. It is very unclear what level of AI R&D labor acceleration would occur with such systems, and this would be heavily dependent on factors like cost, speed, and the parallelizability of research^[8]. Sometimes the term AGI is defined such that TEDAI is equivalent to AGI, but I think defining a different precise term is useful for clarity.

Beyond the level of TEDAI, it can be worth pointing at very generally superhuman AIs: AIs which are generally qualitatively much more capable than humans and greatly dominate humans in virtually all cognitive tasks (while being faster). This level of capability is much less precise, and it is very hard to say much at all about such systems.

Now, we can talk about the following levels of capability:

3x AIs
10x AIs
TEDAI
Very generally superhuman AIs

(Thanks to Ajeya Cotra, Cody Rushing, Eli Lifland, Nate Thomas, Zach Stein-Perlman, Buck Shlegeris, and Claude 3.5 Sonnet for feedback on this post.)

What do I think these levels of capability look like?

Now that I've outlined these levels of capability, we can discuss what they might look like and what the rough conversion into other frameworks (like t-AGI) might be. I'll make some rough guesses here.

My sense is:

3x AIs:
- Qualitative: The AI generally feels as smart as a pretty junior engineer (bottom 25% of new Google junior hires), but it is super knowledgeable, very good at some random tasks, very fast, very persistent, and is better than a pretty junior engineer at being an agent and generally knowing how to resolve software issues. It also looks really dumb in certain ways, sometimes does really stupid things, and sometimes has big robustness issues. The AIs are speeding things up partially via close collaboration with humans (imagine Cursor but with AIs being somewhat more autonomous) and partially via autonomously doing longer-run tasks while asking for human help.
- Total AI progress speed up: 3x AI R&D maybe corresponds to roughly 1.6x overall AI progress speed up, though this depends heavily on how much AI capabilities research is compute-bottlenecked. (With no bottleneck, 1.8x; with 50% reduction in acceleration due to bottleneck, 1.4x.)
  - The BOTEC here is assuming 2/5 of AI progress is on software; we do 2/5 * (2 * (1 - compute-bottleneck-tax) + 1) + 3/5. I think the compute bottleneck tax is probably around 25% with this acceleration (really, we should think about this in terms of tax brackets). (I think square rooting the labor multiplier is also a reasonable starting guess for the compute bottleneck.)
- Anthropic ASL: This is probably late ASL-3 or early ASL-4 on Anthropic's breakdown (my speculation from public knowledge in the latest version of the RSP). It probably isn't ASL-4 for AI R&D but might be ASL-4 for bio or cyber.
- t-AGI: 4 hours (???)
10x AIs:
- Qualitative: The AI generally feels as smart as a median engineer at a top AI company (OpenAI or Anthropic) while preserving (and in some ways increasing) its advantages discussed in the above bullet (knowledge, speed, agency) and still being somewhat less robust than a human.
- Total AI progress speed up: This is maybe around 3.2x overall AI progress speed up. (Again, this depends on compute bottlenecking; with no bottleneck, 4.6x; with 50% bottleneck, 2.8x.) These numbers don't include acceleration due to anything other than AI software R&D, as hardware R&D has longer lead times.
- Anthropic ASL: This is probably early ASL-5 or perhaps late ASL-4.
- t-AGI: 2 days (???)
Top-human-Expert-Dominating AI (TEDAI):
- Qualitative: The AI generally feels roughly as smart as a top human expert and is able to dominate across virtually all domains via increasing capabilities further with other advantages.
- Total AI progress speed up: Very unclear and highly dependent on environmental factors. Perhaps AI R&D labor acceleration is >30x. I've seen BOTECs indicating roughly 15x overall AI progress speed. Human help is no longer relevant.
- Anthropic ASL: Should be ASL-5 or higher.
- t-AGI: >1 year
Very generally superhuman: ??? Everything is really, really hard to predict (as opposed to merely very hard to predict)

My qualitative guesses are focused on something like a nearcast with more focus on timelines where AI approaches haven't massively changed from where it looks like current approaches are going. This is because other cases are much harder to say anything about (and probably involve longer timelines).

Alternative capability breakdowns

t-AGI

I have two main problems with t-AGI:

I don't feel confident that horizon length will be the key variable, so I don't want to bake that into how we discuss capability levels. While AI R&D labor acceleration also makes some implicit assumptions, these assumptions seem much weaker.
I don't feel like I have a very good handle on what various levels of t-AGI feel like, what level of t-AGI we have now, or even how one would measure this in principle. I do think we can measure AI R&D labor acceleration in principle, and I feel like I have a much better intuitive model.

Anthropic's ASL levels

These aren't defined above ASL-3, and the intention is that they will be defined with respect to the necessary level of mitigations (which in my opinion seems likely to focus on security). I've run into some cases where confusion about how ASL levels will end up being defined has caused issues with communication.

Purely qualitative breakdowns

Above, I describe qualitative intelligence of different systems. I expect that people will disagree radically about this (and already do). This is certainly hard to operationalize regardless. So, while this is often worth referencing, I don't think it should be the default approach to discussing capability levels.

Total AI progress speedup or total AI software progress (including compute bottlenecks)

As discussed above, I'm worried that total AI progress speed up pulls in a bunch of exogenous factors people often disagree about. A similar issue related to compute bottlenecks applies if you consider overall AI software progress speed up (rather than merely the labor input into this).

Will all these levels be passed at once?

I think we'll see a slow enough takeoff that I expect to see 3x AIs more than a year before very generally superhuman AIs, but it is unclear how slowly/smoothly we'll progress through units of AI R&D labor acceleration by default. Additionally, adoption delays make the picture more complex. Nonetheless, to the extent you were interested in talking about whether various mitigations would work at different levels of capability, I think AI R&D labor acceleration can be useful for this.

Conclusion

The AI R&D labor acceleration framework seems like a good approach for measuring and discussing AI capabilities, particularly for when discussing misalignment risk and mitigations. It compromises between a focus on the downstream implications of a capability level and on a more qualitative measurement of capability while still being relatively precisely defined.

I use AI R&D, but I expect these numbers would probably transfer fine to any sort of R&D that can be done digitally (in software), which is as measurable as AI R&D, and which the AIs are optimized for as much as AI R&D. ↩︎
Relative to only having access to AI systems publicly available in January 2023. ↩︎
You can also think about this as roughly being: "consider the subset of tasks that aren't bottlenecked by delays/costs in the environment (e.g., not bottlenecked by compute), how much can AIs accelerate people on average". ↩︎
Sometimes "software progress overall acceleration" is referred to as "software progress productivity acceleration", but I find "overall" clearer than "productivity". ↩︎
That is, top human experts with only access to AIs available by January 2023. This is done to avoid the edge case where the human mostly or fully defers to an AI system such that comparing to humans is just comparing the AI to itself. This also avoids comparing to future humans who are substantially augmented by AIs which could be misleading when thinking about the capability threshhold and overall makes this harder to reason about. ↩︎
That is, 2x faster at accomplishing the tasks. ↩︎
This post originally said 2x cheaper, but I realized this operationalization has an multiple issue: once AIs dominate top human experts, we would eventually expect human wages to drop and compute costs to rise until employers are more indifferent (at least for usages that don't require trust and putting aside wage stickiness). One alternative way to operationalize this would be to fix compute prices and wages to the prices we would expect putting aside the effect of the AI automating labor (e.g. extrapolating out compute costs and wages based on earlier trends) and then say "2x cheaper". ↩︎
Beyond human obsolescence, I think it generally becomes less helpful to talk about AI R&D labor acceleration when trying to point at different levels of capability for discussion about misalignment risks and mitigations. Partially this is because our understanding of what the systems will look like gets even worse after human obsolescence. ↩︎

In retrospect, I wish I had also included "AIs capable of full automation of AI R&D" (Superhuman AI Researcher (SAR) from AI 2027) as another level of capability. This is probably below TEDAI. TEDAI implies full automation of AI R&D is possible if we put aside cost constraints, but TEDAI might also require a substantially higher level of capability, particularly due to requiring resolving all deficiencies relative to the human capability profile, including for things like vision.

I think an issue with the definition of TED-AI is that there are lots of tasks that benefit a ton from detailed experience and knowledge that AIs won't have by default. For example, there's probably a bunch of engineers deep in the compute supply chain who have very detailed knowledge and experience with managing their particular operation. I think we'll spend an important time in the intelligence explosion where AIs would be able to automate their job given, say, a month of learning and assistance from them, but couldn't do it on their own. This could really delay the point of "TED-AI" far beyond the point where "The AI generally feels roughly as smart as a top human expert" and where I'll feel like AIs are crushing humans on all the important tasks. (Because for all the most important tasks, people will have put in the work to make AIs dominate humans on those tasks. And it doesn't matter much that there's a bunch of tasks in the old human economy that hasn't been worth that attention yet.)

(IIRC, Tom Davidson's take-off speed model tries to get around something similar by talking about what tasks are "readily" automatable, meaning "i) it would be profitable for organisations to do the engineering and workflow adjustments necessary for AI to perform the task in practice, and ii) they could make these adjustments within 1 year if they made this one of their priorities." I think the 1 year thing sounds too slow though, given that we want to track milestones that may be only months apart. Also, I'm not sure we should condition on it being profitable for orgs to do this given that the opp-cost of AI labor might be really high at the time and it seems pretty fine and cheap to just have humans keep doing some of their old jobs for a bit longer, if they're not in a bottlenecking part of the economy.)

When I use the term TED-AI these days, I mean "you'd strictly prefer to hire the AI over any human expert putting aside narrow experience that's very specific to this job in important domains like mechanical engineering, drone R&D, etc (like the AI dominates putting aside this narrow expertise)". So I do mean something more like readily automatable. The definition I gave in this post does underspecify this.

I think it is often useful to talk about the point when human cognitive labor is totally obsolete. Thus, I think it also makes sense to separately talk about Top-human-Expert-Dominating AI (TEDAI): AIs which strictly dominate top human experts^[5] in virtually all cognitive tasks (i.e., doable via remote work) while being at least 2x faster^[6] and within a factor of 5 on cost^[7].

Hm, but human cognitive labor isn't totally obsolete, because there's still those tasks where humans are a factor 5 cheaper.

Similarly for the definition in footnote 7: If AIs simultaneously became uniformly 2x cheaper than humans on all the tasks holding current compute prices fixed, then that's not actually that impressive. Currently compute spend is a small fraction of spend on human cognitive labor. So even if we got AI software that could generate $2 of cognitive labor revenue for every $1 spent on compute, most of the cognitive labor income would still be accruing to humans.

Of course, TED-AI won't be uniformly cheaper. It'll have spiky capabilities compared to the human capability distribution. So the impressiveness comes from how, at the point where it's only 5x more expensive than humans on its least comparative advantaged cognitive task (or 2x cheaper according to extrapolated compute prices on its most expensive task) then surely it's far superhuman on most things, and humans are totally obsolete on most things.

It does feel kind of off to me that this would correspond to:

The AI generally feels roughly as smart as a top human expert and is able to dominate across virtually all domains via increasing capabilities further with other advantages

I would imagine that the AI would feel much smarter than a top human expert in more than half of domains, but then for there to be a few domains where it both has surprising weaknesses and where those "other advantages" doesn't help that much.

(Somewhat less relevant point: Probably the AI will also feel very smart because "feel smart" is easier to train for than to actually do a good job in a bunch of domains, so the AI will succeed at that before it's actually doing a good job.)

A more specific argument to expect spikiness and therefore TED-AI to be vastly superhuman in most areas:

The human brain is much more sample efficient than machine learning.
In some areas, we can't generate many more samples than what humans can engage with in a lifetime. (E.g.: Forecasting of rare events or results of extremely expensive experiments.) As long as human sample efficiency is better than AI sample efficiency, it will be great to employ humans in these areas.
Once ML beats human sample efficiency, ML will be vastly superhuman in all the areas where there's vastly more data than what humans could absorb in a lifetime.

Possible defeaters to that argument:

Something about generalization/transfer learning?
Maybe you can use some tricks for sample-efficient learning in the low-data regime that doesn't transfer to the high-data regime. (E.g. put all the data points in-context and reason about them -- this wouldn't generalize to a number of data points that's much greater than what fits in the context window.) And therefore beat humans in the low-data regime without necessarily being able to vastly outperform them in the high-data regime.

Maybe I'm just confused about what sense of "feel smart" and what "other advantages" you have in mind such that the advantages helps the AI dominate a bunch of domains but doesn't cause the AI to feel much smarter than top experts. Are you imagining really limiting the parallel compute that can be used (perhaps to one "copy" of the model if those concepts still make sense in the future) and most of the advantages come from increasing parallel compute and coordination of that?