Grats on getting this out! I am overall excited about exploring models that rely more on uplift than on time horizons. A few thoughts:
It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.
In particular, the "doubling difficulty growth factor", which measures whether time horizon increases superexponentially, could change the date of automated coder from 2028 to 2049! I suspect that time horizon is too poorly defined to nail down this parameter, and rough estimates of more direct AI capability metrics like uplift can give much tighter confidence intervals.
I am skeptical that uplift measurements actually give much tighter confidence intervals.
After talking to Thomas about this verbally, we both agree that directly using uplift measurements rather than time horizon could plausibly be better by the end of 2026, though we might have different intuitions about the precise likelihood.
Effective labor/compute ratio only changes by 10-100x during the period in question, so it doesn't affect results much anyway. The fastest trajectories are most affected by compute:labor ratio, but for trajectories that get to 99% automation around 2034, the ratio stays around 1:1.
This isn't true in our model because we allow full coding automation. Given that this is the case in your model, Cobb-Douglas seems like a reasonable approximation.
In this post, I describe a simple model for forecasting when AI will automate AI development. It is based on the AI Futures model, but more understandable and robust, and has deliberately conservative assumptions.
At current rates of compute growth and algorithmic progress, this model's median prediction is >99% automation of AI R&D in late 2032. Most simulations result in a 1000x to 10,000,000x increase in AI efficiency and 300x-3000x research output by 2035. I therefore suspect that existing trends in compute growth and automation will still produce extremely powerful AI on "medium" timelines, even if the full coding automation and superhuman research taste that drive the AIFM's "fast" timelines (superintelligence by ~mid-2031) don't happen.
Why make this?
Scope and limitations
First, this model doesn't treat research taste and software engineering as separate skills/tasks. As such, I see it as making predictions about timelines (time to Automated Coder or Superhuman AI Researcher), not takeoff (the subsequent time from SAR to ASI and beyond). The AIFM can model takeoff because it has a second phase where the SAR's superhuman research taste causes further AI R&D acceleration on top of coding automation. If superhuman research taste makes AI development orders of magnitude more efficient, takeoff could be faster than this model predicts.
Second, this model, like AIFM, doesn't track effects on the broader economy that feed back into AI progress the way Epoch's GATE model does.
Third, we deliberately make two conservative assumptions:
This was constructed and written up fairly quickly (about 15 hours of work), so my opinions on parameters and some of the modeling assumptions could change in the future.
The model
We assume that AI development has the following dynamics:
This implies the following model:
S′(t)=R(t)S1−β=(L1−f)αCζS1−βf(t)=σ(v(log(C(t)S(t))−logEhac))where
None of the components of this model are novel to the AI forecasting literature, but I haven't seen them written up in this form.
Parameter values
The parameters are derived from these assumptions, which are basically educated guesses from other AI timelines models and asking around:
All parameters are independently distributed according to a triangular distribution. Due to the transforms performed to get alpha, zeta, and v, v will not be triangular and alpha and zeta will not be triangular or independent.
For more information see the notebook: https://github.com/tkwa/ai-takeoff-model/blob/main/takeoff_simulation.ipynb
Graphs
All graphs display 40 trajectories with parameters sampled according to the section Parameter Values.
Automation fraction f across the 40 trajectories (logit scale). Most trajectories reach 99% automation of AI R&D by the early-to-mid 2030s.
40 sampled trajectories of the model. Top left: software level S grows subexponentially (but very fast) as automation accelerates research. Top right: the parallel compute:labor ratio C/(L/(1−f)) (raw resource ratio before diminishing returns) decreases if automation is fast, but is ~constant if automation is on track for 99% by ~2034. Bottom left: research production R(t) increases by orders of magnitude. Bottom right: the serial compute:labor ratio Cζ/(L/(1−f))α (with diminishing returns exponents) trends upward. Trajectories are cut off at 99.9999% automation for numerical stability.
Sensitivity analysis: median year of 99% automation as a function of each parameter, with the other parameters sampled from their prior distributions. Higher beta (diminishing returns to software improvement) and higher 1/v (slower automation) delay 99% automation the most, while the other parameters have modest effects.
Observations
Discussion
From playing around with this and other variations to the AI Futures model I think any reasonable timelines model will predict superhuman AI researchers before 2036 unless AI progress hits a wall or is deliberately slowed.
In addition to refining the parameter values with empirical data, I would ideally want to backtest this model on data before 2026. However, a backtest is likely not feasible because automation was minimal before 2025, and automation of AI R&D is the main effect being modeled here.
More on modeling choices
List of differences from the AIFM
It may be useful to cross-reference this with my AIFM summary.
How could we better estimate the parameters?
We can get f(2026) [uplift fraction in 2026] from
v [velocity of automation as capabilities improve] can be obtained by
Why is automation logistic?
Why are labor and compute Cobb-Douglas?
In the AIFM, the median estimate for substitutability between labor and compute is -0.15, and the plausible range includes zero (which would be Cobb-Douglas). I asked Eli why they didn't just say it was Cobb-Douglas, and he said something like Cobb-Douglas giving infinite progress if one of labor/compute goes to infinity while the other remains constant, which is implausible. I have two responses to this:
Why is there no substitutability between tasks?
The AIFM's median was something like ρ=−2.0, meaning very weak substitution effects. To be conservative, I assumed no substitution effect.