The authors don't seem to address the possibility that we are seeing a temporary acceleration of AI, because the labs are ramping methods that are much more expensive to scale, but they are doing so from very low baselines.
Here some evidence for you.
1- The acceleration in ECI precedes coding helping researchers in at least 18 months. Based on my anecdotes, I doubt any researcher at an AI lab is being accelerated by AI since they got access to models like 4.5 Sonnet and GPT-5-Codex. Epoch says: "AI capabilities accelerated in 2024! According to our Epoch Capabilities Index, frontier model improvement nearly doubled, from ~8 points/year to ~15 points/year." I don't think there's any reason to believe that AI-aided R&D acceleration has happened in any meaningful way, other than maybe Sholto's comment.
2- One place where has been an acceleration is on my spending on AI. I am now spending more than one thousand dollars in tokens and the marginal task of my job I am automating with AI costs what I used to pay for AI during an entire month. Toby Ord argues that the costs of AI are increasing exponentially: "the hourly costs for some models are now close to human costs." While the evidence is small and we need further work, if each jump makes the marginal task exponentially more expensive, but for a fixed level of intelligence, we get prices 90% cheaper per year, one could imagine a point where we achieve the AGI at 2028, but only can deploy it economically in 2030. And a world where we achieve the Automated Coder in 2031, but only can deploy it economically in 2035.
3- Despite the METR and ECI indexes of capabilities per unit of time following an exponential with even an acceleration, the underlying trends have changed massively. a- Pretraining scaling has slowed down massively since the GPT-4.5 debacle. b- Massive efforts have been done to create human cured data around the matters we care about. SemiAnalysis say the labs are spending single-digits billions on human generated data. Beren argues most algorithimic progress is data progress. Obviously, replacing the corpus of text from random dudes debating in a 2007 forum to all the intermediate steps of a math proof by a math PhD improves the models. Obviously, this can't scale and is an one-off improvement. b- Inference-time scaling has been improving the models considerably. To the point, I consider OpenAI models like GPT-5.2-Codex-High unusable, given how slow they are. Not only that, but gains from inference-time scaling must be paid every time they are executed. I don't think we can continue to scale inference time compute into the back-half of the decade. c- Toby Ord also argues that RL is in on the order of 1,000,000x less compute efficient than pre-training. He says "I estimate that at the time of writing (Oct 2025), we’ve already seen something like a 1,000,000x scale-up in RL training and it required ≤2x the total training cost. But the next 1,000,000x scale-up would require 1,000,000x the total training cost, which is not possible in the foreseeable future." Regardless of the level, I feel anyone paying attention feels the same way. Ilya argues that RL is learning from a straw.
3a- Google DeepMind Co-founder and CEO, Nobel Prize winner, Demis Hassabis said he is spending most of his time on world models. Facebook AI Research co-founder Yann LeCunn says "LLMs are a dead" and is working on world models. I feel that the "straight-line on charts" crowd, which I am definitely part of, ignore the important aspects of empiricism in the construction of human knowledge. We won't create a LLM with a one-month time horizon and it will reason from first principles how to cure cancer. That's exactly the opposite lesson from Samuel Albaine's Compute Theory of Everything.
4- The authors don't address that they are making a somewhat unverifiable prediction. The largest tasks inside the METR are on the order of 16 hours. I'd argue that the complexity of benchmarking translates to the complexity of improving the models themselves.
4a- I can imagine doing RLVR with endless goals, like Stockfish is always getting better in chess. Maybe we can have LLMs that are ever increasing better in creating better matrix factorization algorithms. I struggle to find which types of such algos we could have where overshooting human capability would be insanely singularity good.
4b- RL doesn't seem to generalize. My market Will a large language model beat a super grandmaster playing chess by EOY 2028? is at 44% and the trend is down. Maxin Saplin's leaderboard of LLM chess has Gemini 3 Pro merely at 1033 rating, vs 1500 for a "class C player". While I have no doubt that if the labs wanted, they could RLVR chess into their LLMs, I think chess is a good example that you can't do insane amounts of RL in one direction and expect good things in other directions.
5- I'd argue "significantly more important than the internet" singularity requires solving one or more of continual learning and simulation (a.k.a. world models). Computers will only get better in matters that involve the real world quickly if they aren't bounded by the real world.
All that said, I confess the straight lines on a chart are immensely persuasive and hard to not extrapolate for many years through the Lindy Effect.
Thank you for the effort. Big fan of the authors.
The authors don't seem to address the possibility that we are seeing a temporary acceleration of AI, because the labs are ramping methods that are much more expensive to scale, but they are doing so from very low baselines.
Here some evidence for you.
1- The acceleration in ECI precedes coding helping researchers in at least 18 months. Based on my anecdotes, I doubt any researcher at an AI lab is being accelerated by AI since they got access to models like 4.5 Sonnet and GPT-5-Codex. Epoch says: "AI capabilities accelerated in 2024! According to our Epoch Capabilities Index, frontier model improvement nearly doubled, from ~8 points/year to ~15 points/year." I don't think there's any reason to believe that AI-aided R&D acceleration has happened in any meaningful way, other than maybe Sholto's comment.
2- One place where has been an acceleration is on my spending on AI. I am now spending more than one thousand dollars in tokens and the marginal task of my job I am automating with AI costs what I used to pay for AI during an entire month. Toby Ord argues that the costs of AI are increasing exponentially: "the hourly costs for some models are now close to human costs." While the evidence is small and we need further work, if each jump makes the marginal task exponentially more expensive, but for a fixed level of intelligence, we get prices 90% cheaper per year, one could imagine a point where we achieve the AGI at 2028, but only can deploy it economically in 2030. And a world where we achieve the Automated Coder in 2031, but only can deploy it economically in 2035.
3- Despite the METR and ECI indexes of capabilities per unit of time following an exponential with even an acceleration, the underlying trends have changed massively. a- Pretraining scaling has slowed down massively since the GPT-4.5 debacle. b- Massive efforts have been done to create human cured data around the matters we care about. SemiAnalysis say the labs are spending single-digits billions on human generated data. Beren argues most algorithimic progress is data progress. Obviously, replacing the corpus of text from random dudes debating in a 2007 forum to all the intermediate steps of a math proof by a math PhD improves the models. Obviously, this can't scale and is an one-off improvement. b- Inference-time scaling has been improving the models considerably. To the point, I consider OpenAI models like GPT-5.2-Codex-High unusable, given how slow they are. Not only that, but gains from inference-time scaling must be paid every time they are executed. I don't think we can continue to scale inference time compute into the back-half of the decade. c- Toby Ord also argues that RL is in on the order of 1,000,000x less compute efficient than pre-training. He says "I estimate that at the time of writing (Oct 2025), we’ve already seen something like a 1,000,000x scale-up in RL training and it required ≤2x the total training cost. But the next 1,000,000x scale-up would require 1,000,000x the total training cost, which is not possible in the foreseeable future." Regardless of the level, I feel anyone paying attention feels the same way. Ilya argues that RL is learning from a straw.
3a- Google DeepMind Co-founder and CEO, Nobel Prize winner, Demis Hassabis said he is spending most of his time on world models. Facebook AI Research co-founder Yann LeCunn says "LLMs are a dead" and is working on world models. I feel that the "straight-line on charts" crowd, which I am definitely part of, ignore the important aspects of empiricism in the construction of human knowledge. We won't create a LLM with a one-month time horizon and it will reason from first principles how to cure cancer. That's exactly the opposite lesson from Samuel Albaine's Compute Theory of Everything.
4- The authors don't address that they are making a somewhat unverifiable prediction. The largest tasks inside the METR are on the order of 16 hours. I'd argue that the complexity of benchmarking translates to the complexity of improving the models themselves.
4a- I can imagine doing RLVR with endless goals, like Stockfish is always getting better in chess. Maybe we can have LLMs that are ever increasing better in creating better matrix factorization algorithms. I struggle to find which types of such algos we could have where overshooting human capability would be insanely singularity good.
4b- RL doesn't seem to generalize. My market Will a large language model beat a super grandmaster playing chess by EOY 2028? is at 44% and the trend is down. Maxin Saplin's leaderboard of LLM chess has Gemini 3 Pro merely at 1033 rating, vs 1500 for a "class C player". While I have no doubt that if the labs wanted, they could RLVR chess into their LLMs, I think chess is a good example that you can't do insane amounts of RL in one direction and expect good things in other directions.
5- I'd argue "significantly more important than the internet" singularity requires solving one or more of continual learning and simulation (a.k.a. world models). Computers will only get better in matters that involve the real world quickly if they aren't bounded by the real world.
All that said, I confess the straight lines on a chart are immensely persuasive and hard to not extrapolate for many years through the Lindy Effect.