(This post is now live on the METR website in a slightly edited form)
In the 9 months since the METR time horizon paper (during which AI time horizons have increased by ~6x), it’s generated lots of attention as well as various criticism on LW and elsewhere. As one of the main authors, I think much of the criticism is a valid response to misinterpretations, and want to list my beliefs about limitations of our methodology and time horizon more broadly. This is not a complete list, but rather whatever I thought of in a few hours.
Figure 1: What it feels like making benchmarks before frontier models saturate them
Despite these limitations, what conclusions do I still stand by?
[1] see eg DeepSeek R1 paper: https://arxiv.org/abs/2501.12948
After @Daniel Kokotajlo invited me to the AI Futures office I ended up talking to Eli and Alex for about an hour, and feel like I have a decent understanding of the model:
(not necessarily that I disagree, just need to think about it more)
Thoughts in no particular order:
Eg looking at transcripts to determine where humans are spending their time when they give Cursor tasks of a certain length
I didn't really define software intelligence explosion, but had something in mind like "self-reinforcing gains from automated research causing capabilities gains in 6 months to be faster than the labor/compute scaleup-driven gains in the 3 years from 2023-2025", and then question I was targeting with the second part was "After the initial speed-up from ASARA, does the pace of progress accelerate or decelerate as AI progress feeds back on itself?"
A 23.5x improvement alone seems like it would qualify as a major explosion if it happened in a short enough period in time
Seems about true. I claim that the nanogpt speedrun suggests this is only likely if future AI labor is exponentially faster at doing research than current humans, with many caveats of course, and I don't really have an opinion on that.
We already know that there is of course a fundamental limit to how fast you can make an algorithm, so the question is always "how close to optimal are current algorithms". It should be our very strong prior that any small subset of frontier model training will hit diminishing returns much quicker than the complete whole.
This is not as small a subset of training as you might think. The 53 optimizations in the nanogpt speedrun touched basically every part of the model, including the optimizer, embeddings, attention, other architectural details, quantization, hyperparameters, code optimizations, and Pytorch version. The main two things that limit a comparison to frontier AI are scale and data improvement. It's known there are many tricks that work at large scale but not at small scale. If you believe the initial 15x speedup is analogous and that the larger scale gives you a faster, then maybe we get something like a 100x speedup atop our current algorithms? But I don't really believe that the original nanoGPT, which was a 300-line repo written to be readable rather than efficient [1], is analogous to our current state. If there were a bunch of low-hanging fruit that could give strongly superlinear returns, we would see 3x/year efficiency gains with small increases in labor or compute over time, but we actually require 5x/year compute increase and ~3x per year labor increase.
A software intelligence explosion is completely possible with linear speedups in cumulative effort. Indeed, it is possible with sublinear increases in cumulative effort.
Agree I was being a bit sloppy here. The derivative being infinite is not relevant in Davidson's model or my mind, it's whether the pace of progress accelerates or decelerates. It could still be very fast as it decelerates, but I'm not really thinking in enough detail to model these borderline cases, so maybe we should think of the threshold for very fast software-driven progress as r > 0.75 or something rather than r > 1.
Diminishing returns in the NanoGPT speedrun:
To determine whether we're heading for a software intelligence explosion, one key variable is how much harder algorithmic improvement gets over time. Luckily someone made the NanoGPT speedrun, a repo where people try to minimize the amount of time on 8x H100s required to train GPT-2 124M down to 3.28 loss. The record has improved from 45 minutes in mid-2024 down to 1.92 minutes today, a 23.5x speedup. This does not give the whole picture-- the bulk of my uncertainty is in other variables-- but given this is existing data it's worth looking at.
I only spent a couple of hours looking at the data [3], but there seem to be sharply diminishing marginal returns, which is some evidence against a software-only singularity.
At first improvements were easy to make without increasing lines of code much, but then improvements became small and LoC required became larger and larger with increasingly small improvements, which means very strong diminishing returns-- speedup is actually sublinear in lines of code. This could be an artifact related to the very large elbow early on, but I mostly believe it.
If we instead look at number of stars as a proxy for amount of attention on the project [4], there are no diminishing returns. The data basically suggest speedup is linear in effort [1], which is consistent with a world where 3x/year increases in labor and compute are required to sustain the historical trend of ~3x/year algorithmic speedups observed by Epoch. However, this still points against a software intelligence explosion, which would require superlinear speedups for linear increases in cumulative effort.
Given that the speedup-vs-stars and speedup-vs-improvement-# graphs are linear but speedup-vs-LoC is sublinear, our guess should be that returns to research output are somewhat sublinear. In the language of Davidson's semi-endogenous growth model, this means [2]. Of course there are massive caveats about extrapolation to future models.
In Davidson's model, the requirement for a software intelligence explosion after research is automated is , where represents inefficiency of parallel work and is the elasticity of research output to cognitive labor at a fixed compute budget. If , this mathematically means and we don't get an SIE.
So I think an SIE will only happen if one or more of the below is true:
[1]: This was previously observed in a tweet from Epoch in February but now we have about twice the data.
[2]: would mean exponential improvements, while implies linear improvement over time at constant labor/compute. So means improvements are actually slower than linear.
[3]: A few minutes ideating, almost an hour writing a prompt for Claude 4.5 Opus, then 30 minutes making graphs and such.
[4]: It's unclear whether to say that stars represent instantaneous effort or total cumulative effort on the project. If we interpret it as instantaneous effort, then we would see diminishing returns. Also it's unclear whether stars are measuring or ; if it might imply slightly increasing returns.
I'm giving this +1 review point despite not having originally been excited about this in 2024. Last year, I and many others were in a frame where alignment plausibly needed a brilliant idea. But since then, I've realized that execution and iteration on ideas we already have is highly valuable. Just look at how much has been done with probes and steering!
Ideas like this didn't match my mental picture of the "solution to alignment", and I still don't think it's in my top 5 directions, but with how fast AI safety has been growing, we can assign 10 researchers to each of 20 "neglected approach"es like this, so it deserves +1 point.
The post has an empirical result that's sufficient to concretize the idea and show it has some level of validity, which is necessary. Adam Jones has a critique. However, the only paper on this so far didn't make it to a main conference and only has 3 cites, so the impact isn't large (yet).
I didn't believe the theory of change at the time and still don't. The post doesn't really make a full case for it, and I doubt it really convinced anyone to work on this for the right reasons.
It may do a good job of giving the author's perspective, but given all these gaps it's not very memorable today in explaining the risks OP cares about-- even if we do end up worrying about them in 5-10 years.
To clarify I'm not very confident that AI will be aligned; I still have a >5% p(takeover doom | 10% of AI investment is spent on safety). I'm not really sure why it feels different emotionally but I guess this is just how brains are sometimes.
I'm glad to see this post come out. I've previously opined that solving these kinds of problems is what proves a field has become paradigmatic:
Paradigms gain their status because they are more successful than their competitors in solving a few problems that the group of practitioners has come to recognize as acute. ––Thomas Kuhn
It has been proven many times across scientific fields that a method that can solve these proxy tasks is more likely to achieve an application. The approaches sketched out here seem like a particularly good fit for a large lab like GDM, because the North Star can be somewhat legible and the team has enough resources to tackle a series of proxy tasks that are relevant and impressive. Not that it would be a bad fit elsewhere either.
The simple model I mentioned on Slack (still WIP, hopefully to be written up this week) tracks capability directly in terms of labor speedup and extrapolates that. Of course, for a more serious timelines forecast you have to ground it in some data.
Here's what I said to Eli on Slack; I don't really have more thoughts since then
we can get f_2026 [uplift fraction in 2026] from
v [velocity of automation as capabilities improve] can be obtained by