Research note: A simpler AI timelines model predicts 99% AI R&D automation in ~2032

Thomas Kwa

In this post, I describe a simple model for forecasting when AI will automate AI development. It is based on the AI Futures model, but more understandable and robust, and has deliberately conservative assumptions.

At current rates of compute growth and algorithmic progress, this model's median prediction is >99% automation of AI R&D in late 2032. Most simulations result in a 1000x to 10,000,000x increase in AI efficiency and 300x-3000x research output by 2035. I therefore suspect that existing trends in compute growth and automation will still produce extremely powerful AI on "medium" timelines, even if the full coding automation and superhuman research taste that drive the AIFM's "fast" timelines (superintelligence by ~mid-2031) don't happen.

Why make this?

The AI Futures Model (AIFM) has 33 parameters; this has 8.
- I previously summarized the AIFM on LessWrong and found it to be very complex. Its philosophy is to model AI takeoff in great detail, which I find admirable and somewhat necessary given the inherent complexity in the world. More complex models can be more accurate, but they can also be more sensitive to modeling assumptions, prone to overfitting, and harder to understand.
AIFM is extremely sensitive to time horizon in a way I wouldn't endorse.
- In particular, the "doubling difficulty growth factor", which measures whether time horizon increases superexponentially, could change the date of automated coder from 2028 to 2049! I suspect that time horizon is too poorly defined to nail down this parameter, and rough estimates of more direct AI capability metrics like uplift can give much tighter confidence intervals.

Scope and limitations

First, this model doesn't treat research taste and software engineering as separate skills/tasks. As such, I see it as making predictions about timelines (time to Automated Coder or Superhuman AI Researcher), not takeoff (the subsequent time from SAR to ASI and beyond). The AIFM can model takeoff because it has a second phase where the SAR's superhuman research taste causes further AI R&D acceleration on top of coding automation. If superhuman research taste makes AI development orders of magnitude more efficient, takeoff could be faster than this model predicts.

Second, this model, like AIFM, doesn't track effects on the broader economy that feed back into AI progress the way Epoch's GATE model does.

Third, we deliberately make two conservative assumptions:

No full automation: as AIs get more capable, they never automate 100% of AI R&D work, just approach it. In the AIFM, automation of coding follows a logistic curve that saturates above 100% (by default 105%), meaning that there is a capability level where they automate all coding.
No substitutability: Automation follows Amdahl's law (speedup = when automated tasks are much faster than manual tasks)

This was constructed and written up fairly quickly (about 15 hours of work), so my opinions on parameters and some of the modeling assumptions could change in the future.

The model

We assume that AI development has the following dynamics:

Research progress is Cobb-Douglas between labor and compute
Software efficiency S follows a Jones model
The key metric we want to predict, fraction of automatable tasks $f$ , increases as a sigmoid in log(effective compute)
Zero substitution between tasks
Labor
- Humans work on ONLY non-automated tasks
- Human labor on each task is $L / (1 - f)$
- AI labor on each task is $C S / f$ , but this doesn't matter because we assume human labor is the bottleneck (since humans work slower than AIs)

This implies the following model:

S^{'} (t) = R (t) S^{1 - β} = {(\frac{L}{1 - f})}^{α} C^{ζ} S^{1 - β}

f (t) = σ (v (log (C (t) S (t)) - log E_{h a c}))

where

$S (t)$ is the algorithmic efficiency multiplier (I assume that training and inference efficiency improve at the same rate)
- so $C (t) S (t)$ is the effective compute of the best AI
$f (t)$ is the fraction of automated tasks at time $t$
$R (t)$ is research production at time $t$
$L (t)$ is human labor, specified as an input time series
$C (t)$ is compute, also an input time series
$α, β, ζ$ are constant
- $α$ is diminishing returns to more labor.
- $β$ is the difficulty exponent for software improvement
- $ζ$ is direct returns to compute. This is not relevant for a software intelligence explosion, but it's highly relevant when looking at how much capabilities will improve given future compute investment.
$E_{h a c}$ is the effective compute level of an AI that can automate half of AI R&D tasks.
$v$ is the automation velocity: S must increase by factor of $e^{1 / v}$ to get from 50% to 73% automation. This is essentially how easy it is to translate capability gains into more automation.

None of the components of this model are novel to the AI forecasting literature, but I haven't seen them written up in this form.

Parameter values

The parameters are derived from these assumptions, which are basically educated guesses from other AI timelines models and asking around:

The rate of change of S in Jan 2026 is 5x/year
1/v is between 1.5 and 4.2
- NB David Rein thinks 2.1 to 4.2
f was between 0.25-0.5 in Jan 2026, implying between 1.33x and 2x uplift. This informs the value of $E_{h a c}$ .
$α / (α + ζ)$ is between 0.12 and 0.35
$α + ζ$ is between 0.8 and 1
$β$ is 0.3 to 1
L doubles every year until 2029, after which it increases 10%/year
C grows 2.6x every year until 2029, after which the growth rate linearly decreases from 2x to 1.25x/year between 2030 and 2058. (This is consistent with Epoch estimates in the near term, and approximates the AIFM's time series after 2030)

All parameters are independently distributed according to a triangular distribution. Due to the transforms performed to get alpha, zeta, and v, v will not be triangular and alpha and zeta will not be triangular or independent.

For more information see the notebook: https://github.com/tkwa/ai-takeoff-model/blob/main/takeoff_simulation.ipynb

Graphs

All graphs display 40 trajectories with parameters sampled according to the section Parameter Values.

Automation fraction f across the 40 trajectories (logit scale). Most trajectories reach 99% automation of AI R&D by the early-to-mid 2030s.

40 sampled trajectories of the model. Top left: software level S grows subexponentially (but very fast) as automation accelerates research. Top right: the parallel compute:labor ratio $C / (L / (1 - f))$ (raw resource ratio before diminishing returns) decreases if automation is fast, but is ~constant if automation is on track for 99% by ~2034. Bottom left: research production R(t) increases by orders of magnitude. Bottom right: the serial compute:labor ratio $C^{ζ} / (L / (1 - f))^{α}$ (with diminishing returns exponents) trends upward. Trajectories are cut off at 99.9999% automation for numerical stability.

Sensitivity analysis: median year of 99% automation as a function of each parameter, with the other parameters sampled from their prior distributions. Higher beta (diminishing returns to software improvement) and higher 1/v (slower automation) delay 99% automation the most, while the other parameters have modest effects.

Observations

The AI Futures model is complex, but its conclusions are fairly robust to simplifications.
The two key uncertainties behind timelines are
- how to measure algorithmic progress (our estimates are still highly uncertain)
- how effective compute relates to % automation of real tasks
At current rates of compute growth and algorithmic progress, there will be >99% automation of AI R&D, 1e3 to 1e8 software efficiency gain, and 300x-3000x research output by 2035, even without full automation or automated research taste. This is clearly transformative AI
- The median date of 99% automation is mid-2032. However, I don't put too much weight on the exact predicted timelines because I haven't thought much about the exact parameter values.
A basic sensitivity analysis shows that higher beta (diminishing returns) and lower v (automation velocity) make 99% automation happen later, and the other parameters don't affect things much.
- One might expect the "labor share" $α / (α + ζ)$ to have a large effect on timelines. The reason it doesn't affect timelines much is that both labor (through automation) and compute (through exogenous compute growth) are scaling quickly and contribute to AI progress.
The parallel compute:labor ratio, measuring the amount of compute per AI or human coder, decreases in the average trajectory and is ~flat in long-timelines trajectories. So in 2030 timelines, the pool of human and AI coders has much less compute than today, while in 2035 timelines, they have about the same amount.
The serial compute:labor ratio goes UP, meaning that compute growth has a larger impact on research output than labor growth. This is because compute is increasing so fast and the parallel labor added by automation doesn't effectively translate into serial labor.

Discussion

From playing around with this and other variations to the AI Futures model I think any reasonable timelines model will predict superhuman AI researchers before 2036 unless AI progress hits a wall or is deliberately slowed.

By progress hitting a wall, I mean something like compute and human labor growth slowing down in ~2030, no architectural breakthroughs, and AI labs not finding anything new to usefully spend resources on to improve performance. We have scaled pretraining, RLHF, RL for agency, and inference, and even one or two more dimensions could keep progress going.
In the sensitivity analysis, automation slowness doesn't push timelines into 2036 unless it's greater than 3.6 (37x efficiency gain required to increase automation from 50% to 73%). As for diminishing returns (beta), we get 2034 timelines even if you assume it's 0.9. So we would need both high automation slowness and high beta to get timelines after 2036.

In addition to refining the parameter values with empirical data, I would ideally want to backtest this model on data before 2026. However, a backtest is likely not feasible because automation was minimal before 2025, and automation of AI R&D is the main effect being modeled here.

More on modeling choices

List of differences from the AIFM

It may be useful to cross-reference this with my AIFM summary.

No substitutability: Automation follows Amdahl's law (speedup = $1 / (1 - f)$ when automated tasks are much faster than manual tasks). AIFM assumes a small degree of substitutability ( $ρ_{c} = - 2$ ).
Automated tasks don't bottleneck: Once a task can be automated, we assume it's much faster than humans and is never the bottleneck-- either because AIs will run much faster than humans in series or somewhat faster in parallel. AIFM assumes automated tasks initially run somewhat faster than human coding and speed up over time.
No full automation: as AIs get more capable, they never automate 100% of AI R&D work, just approach it. In the AIFM, automation of coding follows a logistic that saturates above 100% (by default 105%, a number which seems somewhat arbitrary), meaning that there is a capability level where they automate all coding.
Labor and compute are Cobb-Douglas. Unlike other differences, this one pushes in the direction of shorter timelines. In the AIFM, they are CES and slight complements, so that infinite compute doesn't produce infinite progress. See below for more thoughts.
No use of time horizon: Software efficiency is a direct input to our model rather than being estimated using time horizon. We model automation fraction as strictly logistic in log effective compute, related via rough uplift estimates that we hope to refine in the future. See "[Why make this](#why-make-this)" for why. AIFM estimates the required effective compute for an Automated Coder using a time horizon threshold.
No research taste: We don't model research taste separately; I think of early research taste as continuous with the planning involved in coding, and ignore late research taste. Given the lack of research taste model and certain parameter choices, capability growth happens to be subexponential (so I don't attempt to model whether there will be a taste-only singularity). AIFM has a rich model of research taste that needs another 6 or so parameters and informs the second phase of takeoff, from Automated Coder to ASI and then to the ultimate physical limits of intelligence.

How could we better estimate the parameters?

We can get $f (2026)$ [uplift fraction in 2026] from

transcripts of realistic coding agent usage + success judge + difficulty judge calibrated on tasks of known lengths
uplift RCTs
asking lab employees about their current uplift (since parallel uplift and 1/(1-f) are equivalent in the simple model)

v [velocity of automation as capabilities improve] can be obtained by

guessing the distribution of tasks, using time horizon, maybe using a correction factor for real vs benchmark time horizon
multiple uplift studies over time
comparing older models to newer ones, or having older models try things people use newer models for
listing how many things get automated each year

Why is automation logistic?

A logistic is the simplest choice for anything that maps the reals to (0, 1).
Intuitively, when AIs are already automating >50% of human research, each unit of capabilities progress will allow automating a constant fraction of remaining labor. The logistic has an exponential tail, which matches this intuition.

Why are labor and compute Cobb-Douglas?

In the AIFM, the median estimate for substitutability between labor and compute is -0.15, and the plausible range includes zero (which would be Cobb-Douglas). I asked Eli why they didn't just say it was Cobb-Douglas, and he said something like Cobb-Douglas giving infinite progress if one of labor/compute goes to infinity while the other remains constant, which is implausible. I have two responses to this:

It doesn't seem so implausible to me-- for infinite compute, it would take days to weeks to get to ASI given infinite compute, meaning a 100x-1000x speedup, but once there, infinite compute might allow developers to develop algorithms in months that would take humans billions of years with current compute levels. As for infinite labor, a literally infinite pool of labor could just do training runs by manual calculation and write down the optimal AI weights without using any experiment compute.
Effective labor/compute ratio only changes by 10-100x during the period in question, so it doesn't affect results much anyway. The fastest trajectories are most affected by compute:labor ratio, but for trajectories that get to 99% automation around 2034, the ratio stays around 1:1.

Why is there no substitutability between tasks?

The AIFM's median was something like $ρ = - 2.0$ , meaning very weak substitution effects. To be conservative, I assumed no substitution effect.

Grats on getting this out! I am overall excited about exploring models that rely more on uplift than on time horizons. A few thoughts:

It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.

In particular, the "doubling difficulty growth factor", which measures whether time horizon increases superexponentially, could change the date of automated coder from 2028 to 2049! I suspect that time horizon is too poorly defined to nail down this parameter, and rough estimates of more direct AI capability metrics like uplift can give much tighter confidence intervals.

I am skeptical that uplift measurements actually give much tighter confidence intervals.

After talking to Thomas about this verbally, we both agree that directly using uplift measurements rather than time horizon could plausibly be better by the end of 2026, though we might have different intuitions about the precise likelihood.

Effective labor/compute ratio only changes by 10-100x during the period in question, so it doesn't affect results much anyway. The fastest trajectories are most affected by compute:labor ratio, but for trajectories that get to 99% automation around 2034, the ratio stays around 1:1.

This isn't true in our model because we allow full coding automation. Given that this is the case in your model, Cobb-Douglas seems like a reasonable approximation.

I am skeptical that uplift measurements actually give much tighter confidence intervals.

I think we might already have evidence against longer timelines (2049 timelines from doubling difficulty growth factor of 1.0). Basically, the model needs to pass a backtest against where uplift was in 2025 vs now, and if there's significant uplift now and wasn't before 2025, this implies the automation curve is steep.

Suppose uplift at the start of 2026 was 1.6x as in the AIFM's median (which would be 37.5% automation if you make my simplifying assumptions). If we also know that automation at start of 2025 was at most 20%, this means automation is increasing at 0.876 logits per year, or per ~8x time horizon factor, or 45x effective compute factor etc. (If d.d.g.f is 1.0, automation is logistic in both log TH and log E.) At this slope, we get to 95% AI R&D automation (or in your model, ~100% coding automation) when we hit a time horizon of 14 years, which is less than your median of 125 years and should give timelines before 2045 or so. Uplift might be increasing even faster than this, in which case timelines will be shorter. We don't have any hard data on uplift quite yet, but I suspect that our guesses at uplift should already make us downweight longer timelines, and in Q2 or Q3 I hope we can prove it.

I would still put some weight on longer timelines, but to me this uncertainty doesn't live in something like d.d.g.f. My understanding of the AIFM is that uncertainty in all the time horizon parameters cashes out in the effective compute required for uplift. In this frame, the remaining uncertainty lives in whether the logistic curve in log E holds-- whether ease of automation in the future is similar to ease of early-stage automation we're already observing.

It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.

Yeah, my all-things-considered views are definitely more uncertain. I don't have well-considered probabilities-- I'd probably need to think about it more and perhaps build more simple models that apply in long-timelines cases.

Suppose uplift at the start of 2026 was 1.6x as in the AIFM's median

Where are you getting this 1.6 number?

With respect to the rest of your comment, it feels to me like we have such little evidence about current uplift and what trend it follows (e.g. whether this assumption about a % automation curve that is logistic and its translation to uplift is a reasonable functional form). I'm not sure how strongly we disagree though. I'm much more skeptical of the claim that uplift can give much tighter confidence intervals than that it can give similar or slightly better ones. Again, this could change if we had much better data in a year or two.

I got it from your website:

Our median value for the coding uplift of present-day AIs at AGI companies is that having the AIs is like having 1.6 times as many software engineers (and all the staff necessary to coordinate them effectively).

As for the rest, seems reasonable. I think you can't get around the uncertainty by modeling uplift as some more complicated function of coding automation fraction as in the AIFM, because you're still assuming that's logistic, we can't measure it any better than uplift, plus we're still uncertain how they're related. So we really do need better data.

I think you can't get around the uncertainty by modeling uplift as some more complicated function of coding automation fraction as in the AIFM, because you're still assuming that's logistic, we can't measure it any better than uplift, plus we're still uncertain how they're related. So we really do need better data.

But in the AIFM the coding automation logistic is there to predict the dynamics regarding how much coding automation speeds progress pre-AC. It doesn't have to do with setting the effective compute requirement for AC. I might be misunderstanding something, sorry if so.

Re: the 1.6 number, oh that should actually be 1.8 sorry. I think it didn't get updated after a last minute change to the parameter value. I will fix that soon. Also, that's the parallel uplift. In our model, the serial multiplier/uplift is sqrt(parallel uplift).

In my model it's parallel uplift too. Effective labor (human+AI) still goes through the diminishing returns power to get to serial uplift, which I estimate as between roughly 0.1 and 0.3.

Thanks for doing some sensitivity analysis. It's often number one on my list of frustrating things not included in a model.

Grats on getting this out! I am overall excited about exploring models that rely more on uplift than on time horizons. A few thoughts:

It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.

In particular, the "doubling difficulty growth factor", which measures whether time horizon increases superexponentially, could change the date of automated coder from 2028 to 2049! I suspect that time horizon is too poorly defined to nail down this parameter, and rough estimates of more direct AI capability metrics like uplift can give much tighter confidence intervals.

I am skeptical that uplift measurements actually give much tighter confidence intervals.

Effective labor/compute ratio only changes by 10-100x during the period in question, so it doesn't affect results much anyway. The fastest trajectories are most affected by compute:labor ratio, but for trajectories that get to 99% automation around 2034, the ratio stays around 1:1.

This isn't true in our model because we allow full coding automation. Given that this is the case in your model, Cobb-Douglas seems like a reasonable approximation.

I am skeptical that uplift measurements actually give much tighter confidence intervals.

It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.

Suppose uplift at the start of 2026 was 1.6x as in the AIFM's median

Where are you getting this 1.6 number?

I got it from your website:

Our median value for the coding uplift of present-day AIs at AGI companies is that having the AIs is like having 1.6 times as many software engineers (and all the staff necessary to coordinate them effectively).

I think you can't get around the uncertainty by modeling uplift as some more complicated function of coding automation fraction as in the AIFM, because you're still assuming that's logistic, we can't measure it any better than uplift, plus we're still uncertain how they're related. So we really do need better data.

In my model it's parallel uplift too. Effective labor (human+AI) still goes through the diminishing returns power to get to serial uplift, which I estimate as between roughly 0.1 and 0.3.

Thanks for doing some sensitivity analysis. It's often number one on my list of frustrating things not included in a model.

27

Research note: A simpler AI timelines model predicts 99% AI R&D automation in ~2032

27

Why make this?

Scope and limitations

The model

Parameter values

Graphs

Observations

Discussion

More on modeling choices

List of differences from the AIFM

How could we better estimate the parameters?

Why is automation logistic?

Why are labor and compute Cobb-Douglas?

Why is there no substitutability between tasks?