In this comment, I'll try to respond at the object level arguing for why I expect slower takeoff than "brain in a box in a basement". I'd also be down to try to do a dialogue/discussion at some point.
1.4.1 Possible counter: “If a different, much more powerful, AI paradigm existed, then someone would have already found it.”
I think of this as a classic @paulfchristiano-style rebuttal (see e.g. Yudkowsky and Christiano discuss "Takeoff Speeds", 2021).
In terms of reference class forecasting, I concede that it’s rather rare for technologies with extreme profit potential to have sudden breakthroughs unlocking massive new capabilities (see here), that “could have happened” many years earlier but didn’t. But there are at least a few examples, like the 2025 baseball “torpedo bat”, wheels on suitcases, the original Bitcoin, and (arguably) nuclear chain reactions.[7]
I think the way you describe this argument isn't quite right. (More precisely, I think the argument you give may also be a (weaker) counterargument that people sometimes say, but I think there is a nearby argument which is much stronger.)
Here's how I would put this:
Prior to having a complete version of this much more powerful AI paradigm, you'll first have a weaker version of this paradigm (e.g. you haven't figured out the most efficient way to do the brain algorithmic etc). Further, the weaker version of this paradigm might initially be used in combination with LLMs (or other techniques) such that it (somewhat continuously) integrates into the old trends. Of course, large paradigm shifts might cause things to proceed substantially faster or bend the trend, but not necessarily.
Further, we should still broadly expect this new paradigm will itself take a reasonable amount of time to transition through the human range and though different levels of usefulness even if it's very different from LLM-like approaches (or other AI tech). And we should expect this probably happens at massive computational scale where it will first be viable given some level of algorithmic progress (though this depends on the relative difficulty of scaling things up versus improving the algorithms). As in, more than a year prior to the point where you can train a superintelligence on a gaming GPU, I expect someone will train a system which can automate big chunks of AI R&D using a much bigger cluster.
On this prior point, it's worth noting that of the Paul's original points in Takeoff Speeds are totally applicable to non-LLM paradigms as is much in Yudkowsky and Christiano discuss "Takeoff Speeds". (And I don't think you compellingly respond to these arguments.)
I think your response is that you argue against these perspectives under 'Very little R&D separating “seemingly irrelevant” from ASI'. But, I just don't find these specific arguments very compelling. (Maybe also you'd say that you're just trying to lay out your views rather than compellingly arguing for them. Or maybe you'd say that you can't argue for your views due to infohazard/forkhazard concerns. In which case, fair enough.) Going through each of these:
I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains). How much is “very little”? I dunno, maybe 0–30 person-years of R&D? Contrast that with AI-2027’s estimate that crossing that gap will take millions of person-years of R&D.
Why am I expecting this? I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.
I don't buy that having a "simple(ish) core of intelligence" means that you don't take a long time to get the resulting algorithms. I'd say that much of modern LLMs does have a simple core and you could transmit this using a short 30 page guide, but nonetheless, it took many years of R&D to reach where we are now. Also, I'd note that the brain seems way more complex than LLMs to me!
For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence
My main response would be that basically all paradigms allow for mixing imitation with reinforcement learning. And, it might be possible to mix the new paradigm with LLMs which would smooth out / slow down takeoff.
You note that imitation learning is possible for brains, but don't explain why we won't be able to mix the brain like paradigm with more imitation than human brains do which would smooth out takeoff. As in, yes human brains doesn't use as much imitation as LLMs, but they would probably perform better if you modified the algorthm some and did do 10^26 FLOP worth of imitation on the best data. This would smooth out the takeoff.
Why do I think getting to “relevant at all” takes most of the work? This comes down to a key disanalogy between LLMs and brain-like AGI, one which I’ll discuss much more in the next post.
I'll consider responding to this in a comment responding to the next post.
Edit: it looks like this is just the argument that LLM capabilities come from imitation due to transforming observations into behavior in a way humans don't. I basically just think that you could also leverage imitation more effectively to get performance earlier (and thus at a lower level) with an early version of more brain like architecture and I expect people would do this in practice to see earlier returns (even if the brain doesn't do this).
Instead of imitation learning, a better analogy is to AlphaZero, in that the model starts from scratch and has to laboriously work its way up to human-level understanding.
Noteably, in the domains of chess and go it actually took many years to make it through the human range. And, it was possible to leverage imitation learning and human heuristics to perform quite well at Go (and chess) in practice, up to systems which weren't that much worse than humans.
it takes a lot of work to get AlphaZero to the level of a skilled human, but then takes very little extra work to make it strongly superhuman.
AlphaZero exhibits returns which are maybe like 2-4 SD (within the human distribution of Go players supposing ~100k to 1 million Go players) per 10x-ing of compute.[1] So, I'd say it probably would take around 30x to 300x additional compute to go from skilled human (perhaps 2 SD above median) to strongly superhuman (perhaps 3 SD above the best human or 7.5 SD above median) if you properly adapted to each compute level. In some ways 30x to 300x is very small, but also 30x to 300x is not that small...
In practice, I expect returns more like 1.2 SD / 10x of compute at the point when AIs are matching top humans. (I explain this in a future post.)
1.7.2 “Plenty of room at the top”
I agree with this.
1.7.3 What’s the rate-limiter?
[...]My rebuttal is: for a smooth-takeoff view, there has to be some correspondingly-slow-to-remove bottleneck that limits the rate of progress. In other words, you can say “If Ingredient X is an easy huge source of AGI competence, then it won’t be the rate-limiter, instead something else will be”. But you can’t say that about every ingredient! There has to be a “something else” which is an actual rate-limiter, that doesn’t prevent the paradigm from doing impressive things clearly on track towards AGI, but that does prevent it from being ASI, even after hundreds of person-years of experimentation.[13] And I’m just not seeing what that could be.
Another point is: once people basically understand how the human brain figures things out in broad outline, there will be a “neuroscience overhang” of 100,000 papers about how the brain works in excruciating detail, and (I claim) it will rapidly become straightforward to understand and integrate all the little tricks that the brain uses into AI, if people get stuck on anything.
I'd say that the rate limiter is that it will take a while to transition from something like "1000x less compute efficient than the human brain (as in, it will take 1000x more compute than human lifetime to match top human experts but simultaneously the AIs will be better at a bunch of specific tasks)" to "as compute efficient as the human brain". Like, the actual algorithmic progress for this will take a while and I don't buy your claim that that way this will work is that you'll go from nothing to having an outline of how the brain works and at this point everything will immediately come together due to the neuroscience literature. Like, I think something like this is possible, but unlikely (especially prior to having AIs that can automate AI R&D).
And, while you have much less efficient algorithms, you're reasonably likely to get bottlenecked on either how fast you can scale up compute (though this is still pretty fast, especially if all those big datacenters for training LLMs are still just lying around around!) or how fast humanity can produce more compute (which can be much slower).
Part of my disagreement is that I don't put the majority of the probability on "brain-like AGI" (even if we condition on something very different from LLMs) but this doesn't explain all of the disagreement.
It looks like a version of AlphaGo Zero goes from 2400 ELO (around 1000th best human) to 4000 ELO (somewhat better than the best human) between hours 15 to 40 of training run (see Figure 3 in this PDF). So, naively this is a bit less than 3x compute for maybe 1.9 SDs (supposing that the “field” of Go players has around 100k to 1 million players) implying that 10x compute would get you closer to 4 SDs. However, in practice, progress around the human range was slower than 4 SDs/OOM would predict. Also, comparing times to reach particular performances within a training run can sometimes make progress look misleadingly fast due to LR decay and suboptimal model size. The final version of AlphaGo Zero used a bigger model size and ran RL for much longer, and it seemingly took more compute to reach the ~2400 ELO and ~4000 ELO which is some evidence for optimal model size making a substantial difference (see Figure 6 in the PDF). Also, my guess based on circumstantial evidence is that the original version of AlphaGo (which was initialized with imitation) moved through the human range substantially slower than 4 SDs/OOMs. Perhaps someone can confirm this. (This footnote is copied from a forthcoming post of mine.)
Prior to having a complete version of this much more powerful AI paradigm, you'll first have a weaker version of this paradigm (e.g. you haven't figured out the most efficient way to do the brain algorithmic etc).
A supporting argument: Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn't expect the human brain algorithm to be an all-or-nothing affair. (Unless it's so simple that evolution could find it in ~one step, but that seems implausible.)
Edit: Though in principle, there could still be a heavy-tailed distribution of how useful each innovation is, with one innovation producing most of the total value. (Even though the steps leading up to that were individually slightly useful.) So this is not a knock-down argument.
My claim was “I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains).”
I don’t think anything about human brains and their evolution cuts against this claim.
If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said. There are lots of parts of the human brain that are doing essential-for-AGI stuff, but if they’re not in place, then you also fail to pass the earlier threshold of “impressive and proto-AGI-ish”, e.g. by doing things that LLMs (and other existing techniques) cannot already do.
Or maybe your argument is “brain-like AGI will involve lots of useful components, and we can graft those components onto LLMs”? If so, I’m skeptical. I think the cortex is the secret sauce, and the other components are either irrelevant for LLMs, or things that LLM capabilities researchers already know about. For example, the brain has negative feedback loops, and the brain has TD learning, and the brain has supervised learning and self-supervised learning, etc., but LLM capabilities researchers already know about all those things, and are already using them to the extent that they are useful.
To be clear: I'm not sure that my "supporting argument" above addressed an objection to Ryan that you had. It's plausible that your objections were elsewhere.
But I'll respond with my view.
If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said.
Ok, so this describes a story where there's a lot of work to get proto-AGI and then not very much work to get superintelligence from there. But I don't understand what's the argument for thinking this is the case vs. thinking that there's a lot of work to get proto-AGI and then also a lot of work to get superintelligence from there.
Going through your arguments in section 1.7:
Thanks! Here’s a partial response, as I mull it over.
Also, I'd note that the brain seems way more complex than LLMs to me!
See “Brain complexity is easy to overstate” section here.
basically all paradigms allow for mixing imitation with reinforcement learning
As in the §2.3.2, if an LLM sees output X in context Y during pretraining, it will automatically start outputting X in context Y. Whereas if smart human Alice hears Bob say X in context Y, Alice will not necessarily start saying X in context Y. Instead she might say “Huh? Wtf are you talking about Bob?”
Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?
(If Alice is able says to herself “in this situation, Bob would say X”, then she has a shoulder-Bob, and that’s definitely a benefit not a cost. But that’s predictive learning, not imitative learning. No question that predictive learning is helpful. That’s not what I’m talking about.)
…So there’s my intuitive argument that the next paradigm would be hindered rather than helped by mixing in some imitative learning. (Or I guess more precisely, as long as imitative learning is part of the mix, I expect the result to be no better than LLMs, and probably worse. And as long as we’re in “no better than LLM” territory, I’m off the hook, because I’m only making a claim that there will be little R&D between “doing impressive things that LLMs can’t do” and ASI, not between zero and ASI.)
Noteably, in the domains of chess and go it actually took many years to make it through the human range. And, it was possible to leverage imitation learning and human heuristics to perform quite well at Go (and chess) in practice, up to systems which weren't that much worse than humans.
In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).
In particular, LLMs today are in many (not all!) respects “in the human range” and “perform quite well” and “aren’t that much worse than humans”.
algorithmic progress
I started writing a reply to this part … but first I’m actually kinda curious what “algorithmic progress” has looked like for LLMs, concretely—I mean, the part where people can now get the same results from less compute. Like what are the specific things that people are doing differently today than in 2019? Is there a list somewhere? A paper I could read? (Or is it all proprietary?) (Epoch talks about how much improvement has happened, but not what the improvement consists of.) Thanks in advance.
See “Brain complexity is easy to overstate” section here.
Sure, but I still think it's probably more way more complex than LLMs even if we're just looking at the parts key for AGI performance (in particular, the parts which learn from scratch). And, my guess would be that performance is substantially greatly degraded if you only take only as much complexity as the core LLM learning algorithm.
Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?
This isn't really what I'm imagining, nor do I think this is how LLMs work in many cases. In particular, LLMs can transfer from training on random github repos to being better in all kinds of different contexts. I think humans can do something similar, but have much worse memory.
I think in the case of humans and LLMs, this is substantially subconcious/non-explicit, so I don't think this is well described as having a shoulder Bob.
Also, I would say that humans do learn from imitation! (You can call it prediction, but it doesn't matter what you call it as long as it implies that data from humans makes things scale more continuously through the human ragne.) I just think that you can do better at this than humans based on the LLM case, mostly because humans aren't exposed to as much data.
Also, I think the question is "can you somehow make use of imitation data" not "can the brain learning algorithm immediately use of imitation"?
In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).
Notably this analogy implies LLMs will be able to automate substantial fractions of human work prior to a new paradigm which (over the course of a year or two and using vast computational resources) beats the best humans. This is very different from the "brain in a basement" model IMO. I get that you think the analogy is imperfect (and I agree), but it seems worth noting that the analogy you're drawing suggests something very different from what you expect to happen.
Is there a list somewhere? A paper I could read? (Or is it all proprietary?)
It's substantially proprietary, but you could consider looking at the Deepseek V3 paper. We don't actually have great understanding of the quantity and nature of algorithmic improvment after GPT-3. It would be useful for someone to do a more up to date review based on the best available evidence.
My thoughts on reading this post and your second one:
Maybe something like "non-LLM AGIs are a thing too and we know from the human brain that they're going to be much more data-efficient than LLM ones"; it feels like the focus in conversation has been so strongly on LLM-descended AGIs that I just stopped thinking about that.
Promoted to curated: I think this post is good, as is the next post in the sequence. It made me re-evaluate some of the strategic landscape, and is also otherwise just very clear and structured in how it approaches things.
Thanks a lot for writing it!
But I’m in much closer agreement with that scenario than the vast majority of AI safety & alignment researchers today, who tend to see the “foom & doom” scenario above as somewhere between “extraordinarily unlikely” and “already falsified”!
Those researchers are not asking each other “is it true?”, but rather “lol, can you believe that some people used to believe that?”.[1] Oh well. Laugh all you want. It’s still what I believe.
To clarify my views:
These posts are mainly exploring my disagreement with a group of researchers who think of LLMs[2] as being on a smooth, continuous path towards ASI. This group comprises probably >95% of people working on AI alignment, safety, and governance today[3].
(For many people in this group, if you ask them directly whether there might be important changes in AI algorithms, training approaches, etc., between today and ASI, they’ll say “Oh yes, of course that’s possible”. But if you ask them any other question about the future of AI, they’ll answer as if they expect no such change.)
There’s a very short answer to why I disagree with those LLM-focused researchers on foom & doom: They expect LLMs to scale to ASI, and I don’t.
As noted above, I don't feel particularly strongly that LLMs will scale to ASI and this isn't a very load bearing part of my perspective.
Further, I don't think my views about continuity and slower takeoff (more like 6 months to a few years depending on what you're counting, but also with some probability on more like a decade) are that strongly driven by putting a bunch of probability on LLMs scaling to AGI / full automation of AI R&D. It's based on:
- LLM-focused AGI person: “Ah, that’s true today, but eventually other AIs can do this ‘development and integration’ R&D work for us! No human labor need be involved!”
- Me: “No! That’s still not radical enough! In the future, that kind of ‘development and integration’ R&D work just won’t need to be done at all—not by humans, not by AIs, not by anyone! Consider that there are 8 billion copies of basically one human brain design, and if a copy wants to do industrial design, it can just figure it out. By the same token, there can be basically one future AGI design, and if a copy wants to do industrial design, it can just figure it out!”
I think the LLM-focused AGI people broadly agree with what you're saying and don't see a real disagreement here. I don't see an important distinction between "AIs can figure out development and integration R&D" and "AIs can just learn the relevant skills". Like, the AIs are doing some process which results in a resulting AI that can perform the relevant task. This could be an AI updated by some generic continual learning algorithm or an AI which is trained on a bunch of RL environment that AIs create, it doesn't ultimately make much of a difference so long as it works quickly and cheaply. (There might be a disagreement in what sample efficiency (as in, how efficiently AIs can learn from limited data) people are expecting AIs to have at different levels of automation.)
Similarly, note that humans also need to do things like "figure out how to learn some skill" or "go to school". Similarly, AIs might need to design a training strategy for themselves (if existing human training programs don't work or would be too slow), but it doesn't really matter.
Thanks! I suppose I didn’t describe it precisely, but I do think I’m pointing to a real difference in perspective, because if you ask this “LLM-focused AGI person” what exactly the R&D work entails, they’ll almost always describe something wildly different from what a human skill acquisition process would look like. (At least for the things I’ve read and people I’ve talked to; maybe that doesn’t generalize though?)
For example, if the task is “the AI needs to run a restaurant”, I’d expect the “LLM-focused AGI person” to talk about an R&D project that involves sourcing a giant set of emails and files from lots of humans who have successfully run restaurants, and fine-tuning the AI on that data; and/or maybe creating a “Sim Restaurant” RL training environment; or things like that. I.e., lots of things that no human restaurant owner has ever done.
This is relevant because succeeding at this kind of R&D task (e.g. gathering that training data) is often not quick, and/or not cheap, and/or not even possible (e.g. if the appropriate training data doesn’t exist).
(I agree that if we assert that the R&D is definitely always quick and cheap and possible, at least comparable to how quick and cheap and possible is (sped-up super-) human skill acquisition, then the precise nature of the R&D doesn’t matter much for takeoff questions.)
(Separately, I think talking about “sample efficiency” is often misleading. Humans often do things that have never been done before. That’s zero samples, right? What does sample efficiency even mean in that case?)
I agree there is a real difference, I just expect it to not make much of a difference to the bottom line in takeoff speeds etc. (I also expect some of both in the short timelines LLM perspective at the point of full AI R&D automation.)
fMy view is that on hard tasks humans would also benefit from stuff like building explicit training data for themselves, especially if they had the advantage of "learn once, deploy many". I think humans tend to underinvest in this sort of thing.
In the case of things like restaurant sim, the task is sufficiently easy that I expect AGI would probably not need this sort of thing (though it might still improve performance enough to be worth it).
I expect that as AIs get smarter (perhaps beyond the AGI level) they will be able to match humans at everything without needing to do explicit R&D style learning in cases where humans don't need this. But, this sort of learning might still be sufficiently helpful that AIs are ongoingly applying it in all domains where increased cognitive performance has substantial returns.
(Separately, I think talking about “sample efficiency” is often misleading. Humans often do things that have never been done before. That’s zero samples, right? What does sample efficiency even mean in that case?)
Sure, but we can still loosely evaluate sample efficiency relative to humans in cases where some learning (potentially including stuff like learning on the job). As in, how well can the AI learn from some some data relative to humans. I agree that if humans aren't using learning in some task then this isn't meaningful (and this distinction between learning and other cognitive abilities is itself a fuzzy distinction).
On the foom side, Paul Christiano brings up Eliezer Yudkowsky’s past expectation that ASI “would likely emerge from a small group rather than a large industry” as a failed prediction here [disagreement 12] and as “improbable and crazy” here.
Actually, I don't think Paul says this is a failed prediction in the linked text. He says:
The Eliezer predictions most relevant to “how do scientific disciplines work” that I’m most aware of are incorrectly predicting that physicists would be wrong about the existence of the Higgs boson () and expressing the view that real AI would likely emerge from a small group rather than a large industry (pg 436 but expressed many places).
My understanding is that this is supposed to be read as "[incorrectly predicting that physicists would be wrong about the existence of the Higgs boson ()] and [expressing the view that real AI would likely emerge from a small group rather than a large industry]", Paul isn't claiming that the view that real AI would likely emerge from a small group is a failed prediction!
On "improbable and crazy", Paul says:
The debate was about whether a small group could quickly explode to take over the world. AI development projects are now billion-dollar affairs and continuing to grow quickly, important results are increasingly driven by giant projects, and 9 people taking over the world with AI looks if anything even more improbable and crazy than it did then. Now we're mostly talking about whether a $10 trillion company can explosively grow to $300 trillion as it develops AI, which is just not the same game in any qualitative sense. I'm not sure Eliezer has many precise predictions he'd stand behind here (setting aside the insane pre-2002 predictions), so it's not clear we can evaluate his track record, but I think they'd look bad if he'd made them.
Note that Paul says "looks if anything even more improbable and crazy than it did then". I think your quotation is reasonable, but it's unclear if Paul thinks this is "crazy" or if he thinks it's just more incorrect and crazy-looking than it was in the past.
I just reworded from “as a failed prediction” to “as evidence against Eliezer’s judgment and expertise”. I agree that the former was not a good summary, but am confident that the latter is what Paul intended to convey and expected his readers to understand, based on the context of disagreement 12 (which you quoted part but not all of). Sorry, thanks for checking.
Oh no, I didn't realize your perspective was this gloomy. But it makes a lot of sense. Actually it mostly comes down to, you can just dispute the consensus[1] that the classically popular Yudkowskyian/Bostromian views have been falsified by the rise of LLMs. If they haven't, then fast takeoff now is plausible for mostly the same reasons that we used to think it's plausible.
I think the path from here to AGI is bottlenecked by researchers playing with toy models, and publishing stuff on arXiv and GitHub.
I think there is some merit to just asking these people to do something else. Maybe not a lot of merit, but a little more than zero, at least for some of them. Especially if they are on this site. Not with a tweet, but by using your platform here. (Plausibly you have already considered this and have good reasons for why it's a terrible idea, but it felt worth suggesting.)
I'm not sure if this is in fact a consensus, but it sure feels that way ↩︎
This is a two-post series on AI “foom” (this post) and “doom” (next post).
A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via . Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this new mega-mind. The extinction of humans (and every other species) would rapidly follow (“doom”). The ASI would then spend countless eons fulfilling its desires, desires which we humans would find to be bizarre and pointless.
Now, I don’t endorse every word of that foom & doom scenario above—for example, I don't think “foom” requires recursive self-improvement. But I’m in much closer agreement with that scenario than the vast majority of AI safety & alignment researchers today, who tend to see the “foom & doom” scenario above as somewhere between “extraordinarily unlikely” and “already falsified”!
Those researchers are not asking each other “is it true?”, but rather “lol, can you believe that some people used to believe that?”.[1] Oh well. Laugh all you want. It’s still what I believe.
Conversely, from my perspective as a foom & doomer, it’s the mainstream contemporary AI alignment discourse that feels increasingly foreign and strange. How, I ask myself, do so many seemingly reasonable people wind up with such wildly, bafflingly over-optimistic beliefs as “P(doom)≲50%”??
Anyway, my main goal in these two posts is to explore how I wind up in such a different place as most other alignment researchers do today, on the question of foom & doom. I don’t particularly expect to win skeptical readers over to my side, but would at least like to convey that foom & doom is a story that hangs together and deserves a modicum of consideration.
These posts are mainly exploring my disagreement with a group of researchers who think of LLMs[2] as being on a smooth, continuous path towards ASI. This group comprises probably >95% of people working on AI alignment, safety, and governance today[3].
(For many people in this group, if you ask them directly whether there might be important changes in AI algorithms, training approaches, etc., between today and ASI, they’ll say “Oh yes, of course that’s possible”. But if you ask them any other question about the future of AI, they’ll answer as if they expect no such change.)
There’s a very short answer to why I disagree with those LLM-focused researchers on foom & doom: They expect LLMs to scale to ASI, and I don’t. Instead I expect that ASI will be a very different AI paradigm: “brain-like AGI” (more on which below and in the next post).
So if you’re an LLM-focused reader, you may be thinking: “Well, Steve is starting from a weird premise, so no wonder he gets a weird conclusion. Got it. Cool. …Why should I bother reading 15,000 more words about this topic?”
But before you go, I do think there are lots of interesting details in the story of exactly how those different starting premises (LLMs vs a different paradigm) flow down to wildly divergent views on foom & doom.
And some of those details will also, incidentally, clarify disagreements within the LLM-focused community. For example,
So I’m hopeful that these posts will have some “food for thought” for doomers like me trying to understand where those P(doom)≲50% “optimists” are coming from, and likewise for “optimists” trying to better understand doomers.
This post covers “foom”, my belief that there will be a sharp localized takeoff, in which a far more powerful and compute-efficient kind of AI emerges suddenly into an utterly unprepared world. I explore the scenario, various arguments against it and why I don’t find them compelling, and the terrifying implications (if true) on our prospects for AI governance, supervised oversight, testing, and more. Here’s the outline:
LLMs are very impressive, but they’re not AGI yet—not by my definition. For example, existing AIs are nowhere near capable of autonomously writing a business plan and then founding a company and growing it to $1B/year revenue, all with zero human intervention. By analogy, if humans were like current AIs, then humans would be able to do some narrow bits of founding and running companies by ourselves, but we would need some intelligent non-human entity (angels?) to repeatedly intervene, assign tasks to us humans, and keep the larger project on track.
Of course, humans (and groups of humans) don’t need the help of angels to conceive and carry out ambitious projects, like building businesses or going to the moon. We can do it all by ourselves. So by the same token, future AGIs (and groups of AGIs) won’t need the help of humans.
…So that’s my pitch that AGI doesn’t exist yet. And thus, the jury is still out on what AGI (and later, ASI) will look like, or how it will be made.
My expectation is that, for better or worse, LLMs will never be able to carry out those kinds of projects, even after future advances in scaffolding, post-training, and so on. If I’m right, that wouldn’t mean that those projects are beyond the reaches of AI—it’s clearly possible for some algorithm to do those things, because humans can! Rather it would mean that LLMs are the wrong algorithm class. Instead, I think sooner or later someone will figure out a different AI paradigm, and then we’ll get superintelligence with shockingly little compute, shockingly little effort, and in shockingly little time. (I’ll quantify that later.)
Basically, I think that there's a “simple(ish) core of intelligence”, and that LLMs don't have it. Instead, people are hacking together workarounds via prodigious quantities of (in Ajeya’s terminology) “scale” (a.k.a. compute, §1.5 below) and “schlep” (a.k.a. R&D, §1.7 below). And researchers are then extrapolating that process into the future, imagining that we’ll turn LLMs into ASI via even more scale and even more schlep, up to quantities of scale and schlep that strike me as ludicrously unnecessary and implausible.
The whole cortex is (more-or-less) a uniform randomly-initialized learning algorithm, and I think it’s basically the secret sauce of human intelligence. Even if you disagree with that, we can go up a level to the whole brain: the human brain algorithm has to be simple enough to fit in our (rather small) genome.[4] And not much evolution happened between us and chimps. And yet our one brain design, without modification, was able to invent farming and science and computers and rocket ships and everything else, none of which has any straightforward connection to tasks on the African savannah.
Anyway, the human cortex is this funny thing with 100,000,000 repeating units, each with 6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on. Nobody knows how it works. You can look up dozens of theories explaining what each of the 6-ish layers is doing and how, but they all disagree with each other. Some of the theories are supported by simulations, but those simulations are unimpressive toy models with no modern practical applications whatsoever.
Remember, if the theories were correct and complete, then they could be turned into simulations able to do all the things that the real human cortex can do[5]—vision, language, motor control, reasoning, inventing new scientific paradigms from scratch, founding and running billion-dollar companies, and so on.
So here is a very different kind of learning algorithm waiting to be discovered, one which we know can scale to AGI, and then to ASI beyond that (per §1.7.2 below). And people are working on it as we speak, and they haven’t succeeded yet, despite decades of work and billions of dollars of resources devoted to figuring it out.
(To be clear, I desperately hope they continue to fail! At least until we have a much better plan for Safe & Beneficial brain-like AGI. See especially §1.8.4 below and the next post.)
Here are three perspectives:
Another place this comes up is robotics:
Me: “Future powerful AI will already be a good robotics algorithm!”[6]
…After all, if a human wants to use a new kind of teleoperated robot, nobody needs to do a big R&D project or breed a new subspecies of human. You just take an off-the-shelf bog-standard human brain, and if it wants to pilot a new teleoperated robot, it will just autonomously figure out how to do so, getting rapidly better within a few hours. By the same token, there can be one future AGI design, and it will be able to do that same thing.
I think of this as a classic @paulfchristiano-style rebuttal (see e.g. Yudkowsky and Christiano discuss "Takeoff Speeds", 2021).
In terms of reference class forecasting, I concede that it’s rather rare for technologies with extreme profit potential to have sudden breakthroughs unlocking massive new capabilities (see here), that “could have happened” many years earlier but didn’t. But there are at least a few examples, like the 2025 baseball “torpedo bat”, wheels on suitcases, the original Bitcoin, and (arguably) nuclear chain reactions.[7]
Also, there’s long been a $1M cash bounty plus eternal fame and glory for solving the Riemann Hypothesis. Why hasn’t someone already solved it? I dunno! I guess it’s hard.
“Ah, but if companies had been putting billions of dollars into solving the Riemann Hypothesis over the last decade, as they have been doing for AI, then the Riemann Hypothesis surely would have been solved by now, right?” I dunno! Maybe! But not necessarily.
“Ah, but if the Riemann Hypothesis is that hard to solve, it must be because the proof is extraordinarily intricate and complicated, right?” I dunno! Maybe! But not necessarily. I think that lots of math proofs are elegant in hindsight, but took a lot of work to discover.
As another example, there was widespread confusion about causal inference for decades before Judea Pearl and others set us straight, with a simple and elegant framework.
So likewise, there can be a “simple(ish) core of intelligence” (§1.3 above) that is taking people a while to discover.
Of course, the strongest argument to me is the one in §1.3.1 above: the human cortex is an existence proof that there are important undiscovered insights in the world of learning algorithms.
Well, I don’t think LLMs will scale to ASI. Not with multimodal data, not with RL from Verifiable Rewards post-training, not with scaffolding, not with anything else, not soon, not ever. That’s my belief, which I won’t argue for here. Seems like we’ll find out one way or the other quite soon.
(To be clear, I could be wrong, and certainly don’t want to discourage people from contingency-planning for the possibility that souped-up future LLM systems will scale to ASI.)
I dispute the word “just”. Different ML algorithms can be quite different from each other!
I think the new paradigm will bring a shocking phase shift allowing dramatically more capabilities from dramatically less compute (see later sections), along with a shocking phase shift in the difficulty of technical alignment, including proneness to egregious scheming and deception (next post), as compared to current and future LLMs.
I have two responses.
First, I disagree with that prediction. Granted, probably LLMs will be a helpful research tool involved in finding the new paradigm, but there have always been helpful research tools, like PyTorch and arXiv and Google, and I don’t expect LLMs to be in a fundamentally different category from those other helpful research tools.
Second, even if it’s true that LLMs will discover the new paradigm by themselves (or almost by themselves), I’m just not sure I even care. I see the pre-paradigm-shift AI world as a lesser problem, one that LLM-focused AI alignment researchers (i.e. the vast majority of them) are already focusing on. Good luck to them. And I want to talk about what happens in the crazy world that we enter after that paradigm shift.
We already know that different ML approaches can have different quantitative relationships between compute and performance. For example, Fig. 7 of the classic 2020 “Scaling Laws” paper shows perplexity scaling laws for LSTMs and transformers, and they do not overlay. I expect the next paradigm to be a very different learning algorithm, so the compute-versus-performance curves that we’re used to today are just irrelevant, from my perspective. After the new paradigm, all bets are off.
Instead, my guess (based largely on lots of opinions about exactly what computations the human brain is doing and how) is that human-level human-speed AGI will require not a data center, but rather something like one consumer gaming GPU—and not just for inference, but even for training from scratch.
So, whereas most people would say “Groups of humans can create $1B/year companies from scratch without any divine intervention, but groups of LLMs cannot create $1B/year companies from scratch without any human intervention. Welp, I guess we need even more training compute…”
…I would instead say “The latest LLMs are the wrong AI paradigm, but next-paradigm AI will be able to do things like that, starting from random initialization, with 1000× less training compute than was being used to train LLMs in 2022![8]
I won’t defend that here; see Thoughts on hardware / compute requirements for AGI for some of my thinking.
Instead, I’ll focus on how very low training compute feeds into many of my other beliefs.
I feel strongly that it would be better if AGI were invented later than sooner (other things equal, on the current margin), because I think we have a lot more work to do on technical alignment (among many other things), and we’re making progress but are nowhere near ready, and we need to be doing this work way ahead of time (§1.8.4 below).
…But I’m not sure that actual existing efforts towards delaying AGI are helping.
I think the path from here to AGI is bottlenecked by researchers playing with toy models, and publishing stuff on arXiv and GitHub. And I don’t think most existing public advocacy against building AGI will dissuade those researchers.
The problem is: public advocacy is way too centered on LLMs, from my perspective.[9] Thus, those researchers I mentioned, who are messing around with new paradigms on arXiv, are in a great position to twist “Pause AI” type public advocacy into support for what they’re doing!
“You don’t like LLMs?”, the non-LLM AGI capabilities researchers say to the Pause AI people, “Well how about that! I don’t like LLMs either! Clearly we are on the same team!”
This is not idle speculation—almost everyone that I can think of who is doing the most dangerous kind of AI capabilities research, the kind aiming to develop a new more-powerful-than-LLM AI paradigm, is already branding their work in a way that vibes with safety. For example, see here where I push back on someone using the word “controllability” to talk about his work advancing AI capabilities beyond the limits of LLMs. Ditto for “robustness” (example), “adaptability” (e.g. in the paper I was criticizing here), and even “interpretability” (details).
I think these people are generally sincere but mistaken, and I expect that, just as they have fooled themselves, they will also successfully fool their friends, their colleagues, and government regulators. Well, the government regulators hardly matter anyway, since regulating the activity of “playing with toy models, and publishing stuff on arXiv and GitHub” is a hell of an ask—I think it’s so unlikely to happen that it’s a waste of time to even talk about it, even if it were a good idea all-things-considered.[10]
(I think non-LLM-focused x-risk outreach and education is good and worthwhile. I expect it to be only slightly helpful for delaying AGI, but “slightly helpful” is still helpful, and more importantly outreach and education has many other good effects like bolstering safety research.)
Once the new paradigm is known and developed (see below), the actors able to train ASI from scratch will probably number in the tens of thousands, spread all around the world. We’re not just talking about five giant firms with gazillion-dollar data centers, as LLM-focused people tend to imagine.
Thus, for example, if governments know where all the giant data centers are and what code they’re running—well, I guess that’s probably better than governments not knowing that. But I think it’s only marginally helpful, in itself.
(That’s not to say that there is nothing useful happening in the space of regulating AGI. There are various things that would be slightly helpful,[11] and again, slightly helpful is still helpful.)
A classic x-risk argument says that ambitious callous AGIs would be motivated to wipe out humans in order to better accomplish their goals. And then a classic anti-x-risk counterargument replies that no, wiping out humans would be a murder-suicide, because there would be no one to run the electrical grid and chip factories etc. And while murder-suicide is a possible AGI motivation, it’s a less likely motivation than the AGI having long-term goals that benefit from its own survival.
Then what’s the pro-x-risk counter-counter-argument?
One approach is to tell a story that involves AGI maneuvering into power, then the world builds ever more chips and robots over a few decades, and then human extinction happens (more in “Response to Dileep George: AGI safety warrants planning ahead” §3.3.4 or this Carl Shulman interview).
…But what I really believe is that AGIs could wipe out humans and bootstrap their way back to running the world on their own, after very little prep work—see “What does it take to defend the world against out-of-control AGIs?” §3.3.3 for details. And this hypothesis starts seeming much more plausible if there are already enough chips lying around to run hundreds of millions of human-level human-speed AGIs. And that’s what I expect to be the case.
So again, this isn’t much of a crux for doom, but I still feel like it’s an important ingredient of the picture in my head.
I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains). How much is “very little”? I dunno, maybe 0–30 person-years of R&D? Contrast that with AI-2027’s estimate that crossing that gap will take millions of person-years of R&D.
Why am I expecting this? I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.
But here are a couple additional hints about where I’m coming from:
I’m definitely not saying that it will be easy to develop the future scary paradigm to ASI from scratch. Instead I’m talking about getting to ASI from the point where the paradigm has already crossed the threshold of being clearly relevant to AGI. (LLMs are already well past this threshold, but the future scary paradigm is obviously not.) In particular, this would be the stage where lots of people believe it’s a path to AGI in the very near future, where it’s being widely used for intellectual work, and/or it’s doing stuff clearly related to the Safe & Beneficial AGI problem, by creating visibly impressive and proto-AGI-ish useful artifacts.
It takes a lot of work to get past that threshold! Especially given the existence of LLMs. (That is: the next paradigm will struggle to get much attention, or make much money, until the next paradigm is doing things that LLMs can’t do—and LLMs can do a lot!)
Why do I think getting to “relevant at all” takes most of the work? This comes down to a key disanalogy between LLMs and brain-like AGI, one which I’ll discuss much more in the next post.
The power of LLMs comes almost entirely from imitation learning on human text. This leads to powerful capabilities quickly, but with a natural ceiling (i.e., existing human knowledge), beyond which it’s unclear how to make AI much better.
Brain-like AGI does not involve that kind of imitation learning (again, more in the next post). Granted, I expect brain-like AGI to also “learn from humans” in a loose sense, just as humans learn from other humans. But the details are profoundly different from the kind of imitation learning used by LLMs. For example, if Alice says something I don’t understand, I will be aware of that fact, and I’ll reply “huh?”. I won’t (usually) just start repeating what Alice says in that same context. Or if I do, this will not get me to any new capability that LLMs aren’t already covering much better. LLMs, after all, are virtuosos at simply repeating what they heard people say during pretraining, doing so with extraordinary nuance and contextual sensitivity.
As another suggestive example, kids growing up exposed to grammatical language will learn that language, but kids growing up not exposed to grammatical language will simply create a new grammatical language from scratch, as in Nicaraguan Sign Language and creoles. (Try training an LLM from random initialization, with zero tokens of grammatical language anywhere in its training data or prompt. It’s not gonna spontaneously emit grammatical language!) I think that’s a good illustration of why imitation learning is just entirely the wrong way to think about what’s going on with brain algorithms and brain-like AGI.
For brain-like AGI, all the potential blockers to ASI that I can imagine, would also be potential blockers for crossing that earlier threshold of being clearly relevant to AGI at all, a threshold that requires using language, performing meaningful intellectual work that LLMs can’t do, and so on.
Instead of imitation learning, a better analogy is to AlphaZero, in that the model starts from scratch and has to laboriously work its way up to human-level understanding. It can’t just regurgitate human-level understanding for free. And I think that, if it can climb up to human-level understanding, it can climb past human-level understanding too, with a trivial amount of extra R&D work and more training time—just as, by analogy, it takes a lot of work to get AlphaZero to the level of a skilled human, but then takes very little extra work to make it strongly superhuman.
And speaking of strongly superhuman:
The human brain algorithm has lots of room for capabilities improvement, including (1) more neurons, (2) speed, (3) motivation (e.g. intellectual curiosity, being interested in ideas and getting things done rather than status and gossip), (4) anything else that makes human geniuses tower over human midwits, but much more of it, (5) things like cloning, merging weight-updates from clones, high-bandwidth communication, etc. More at Response to Blake Richards: AGI, generality, alignment, & loss functions §3.2.
One Paul-Christiano-style counterargument (cf. his post “Takeoff speeds”) would be: “All those things you listed under ‘plenty of room at the top’ above for why AGIs can outperform humans—scale, speed, cloning, etc.—are things that could happen before, not after, human-level, making up for some other deficiency, as opposed to your implied suggestion that we’ll get to human-level in a human-brain-like way first, and only then rapidly scale, speed it up, clone many copies, etc.”
My rebuttal is: for a smooth-takeoff view, there has to be some correspondingly-slow-to-remove bottleneck that limits the rate of progress. In other words, you can say “If Ingredient X is an easy huge source of AGI competence, then it won’t be the rate-limiter, instead something else will be”. But you can’t say that about every ingredient! There has to be a “something else” which is an actual rate-limiter, that doesn’t prevent the paradigm from doing impressive things clearly on track towards AGI, but that does prevent it from being ASI, even after hundreds of person-years of experimentation.[13] And I’m just not seeing what that could be.
Another point is: once people basically understand how the human brain figures things out in broad outline, there will be a “neuroscience overhang” of 100,000 papers about how the brain works in excruciating detail, and (I claim) it will rapidly become straightforward to understand and integrate all the little tricks that the brain uses into AI, if people get stuck on anything.
I wind up feeling like the wall-clock time between the new paradigm being “seemingly irrelevant to AGI” and ASI is, I dunno, two years on the high side, and zero on the low side.
Specifically, on the low side, I wouldn’t rule out the possibility that a single training run is the first to surpass both the “clearly relevant to AGI” threshold and the ASI threshold, in which case they would happen basically simultaneously (perhaps within the same week).
To be clear, the resulting ASI after those 0–2 years would not be an AI that already knows everything about everything. AGI and ASI (in my opinion) aren’t about already knowing things, but rather they’re about not knowing things, yet being able to autonomously figure them out (§1.7.1 above). So the thing we get after the 0–2 years is an AI that knows a lot about a lot, and if it wants to dive deeper into some domain, it can do so, picking it up with far more speed, depth, and insight than any human could.
Think of an army of a million super-speed telepathic scaled-up John von Neumann clones. If you ask them some question about cryptocurrency, then maybe they won’t know the answer off the top of their head, because maybe it happens that there wasn’t any information about cryptocurrency in their training environment to date. But then they’ll go spend a day of wall-clock time (≈ months or years of subjective time) reading up on cryptocurrency and all its prerequisites, and playing with the code, and so on, and then they’ll have a deep, beyond-world-expert-level understanding.
Even if the next paradigm requires very few person-years of R&D to get from “clearly relevant to ASI” to “actual ASI”, it may take a long time if the individual training runs are slow. But I don’t think that will be much of a limiter.
Instead, I expect that the next paradigm will involve so little compute, and be so amenable to parallelization, that trainings from “birth” (random initialization) to adult-human-level will take maybe a couple weeks, notwithstanding the fact that human brains require decades.[14] And I think picking the low-hanging fruit of efficiency and parallelization will happen early on, probably during the earlier “seemingly irrelevant” stage—why would anyone ever run a year-long training, when they can instead spend a few months accelerating and parallelizing the algorithm, and then run the same training much faster?
The wall-clock time for takeoff depends in part on people making decisions, and people could decide to go actually slowly and incrementally. Even in the “single training run” case, heck, in principle, that training run could happen over the course of a zillion years, with gradient descent being performed by one guy with an abacus. But given how little compute and R&D are involved in getting to ASI, I think the only way to get deliberate slowdown would involve excellent secrecy on the algorithms, and one group (or consortium) way in the lead, and then this one group “burns their lead” in order to do incremental testing and other safety interventions.[15]
We should keep possibilities like that in mind. But I see it as realistically making takeoff smoother by months, at best, not years.
As mentioned above, some LLM-focused people like the AI-2027 authors agree with me about takeoff being pretty sharp, with the world radically changing over the course of months rather than years. But they get that conclusion via a very different path than I do.
Recall from Bostrom (2014) the (qualitative) formula:
The LLM-focused people get fast “rate of change of intelligence” under an assumption that “recalcitrance” (difficulty of improving AI) is high and steeply increasing, but the “optimization power” brought to bear on improving AI is even higher, and even more steeply increasing.
Whereas I think we’re in the wrong paradigm today, but when that changes, recalcitrance will be quite low, at least across the range from “doing anything impressive whatsoever” to ASI. So we’ll get sharp takeoff (across that range) even without any particular increase in optimization power being applied to AI research.
Of course, somewhere between “doing anything impressive whatsoever” and ASI, we’ll get AIs that can do excellent AI capabilities research. And that could make takeoff faster still.[16] But I don’t think that would change my general picture very much; it would just shorten this already-short period a bit further, by effectively clipping off the end of it.
This is an area where I kinda disagree with not just Paul Christiano but also Eliezer, who historically has seemed to put a lot of emphasis on the ability of AI to do excellent AI R&D. I think where Eliezer was coming from (see e.g. Intelligence Explosion Microeconomics (2013) p56) was: human brains are comically inefficient (in his view), and human institutions even more so, and thus AI is going to be much better than humans at AI R&D, leading to rapid self-improvement. Whereas I think that’s kinda missing the point, because by the time AI is already that good at AI R&D, we’re already after the critical and controversial part. Remember the “simple(ish) core of intelligence” in §1.3 above—I think AI will get that good at AI R&D via a kind of competence that generalizes into every other domain too.
In other words, I think that, if you understand the secret sauce of the human brain, then you straightforwardly and quickly get to a ASI at the level of a million super-speed telepathic scaled-up John von Neumann clones. Then Eliezer would respond: “Ah, but then that super John von Neumann clone army would be able to do some kick-ass AI research to make their algorithms even more powerful still!” And, yeah! That’s true! But by that point, does it even matter?
A lot of things seem to point in that direction, including:
There may be safety or ethical[17] concerns delaying the deployment of these new-paradigm AIs;
Indeed, I’m not even sure if there will be much “internal deployment” to speak of, for the same reasons. I think ASI may well arrive before the developers have really gotten past the stage of testing and exploration.
So I think the Eliezer-ish scenario where strong superintelligence escapes onto the internet, in a world otherwise much like today, is quite plausible, and is my central expectation right now.
Of course, the future world won’t be exactly like today. It will presumably have more and better chips. It will have better, cheaper, and far more widespread LLMs, and people will take them for granted, complain about them, and/or forget what life was like before them, just as we do now for cell phones and social media. The already-ongoing semantic bleaching of the terms “AGI” and “ASI” will continue, until the terms become just meaningless AI company marketing speak. Various things will happen in geopolitics. Perhaps some early version of next-paradigm-AI will be getting used profitably in e.g. the robotics sector.
…But nothing like the kind of obvious common-knowledge pre-ASI craziness envisioned in Paul-style smooth-takeoff scenarios (e.g. “There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles.”).
Needless to say, if I’m right, then we need to be doing serious prep work for this next-paradigm AI, even while this next-paradigm AI is obscure, seemingly irrelevant, and only good for running toy-model demos or unsexy niche applications. Or maybe before they’re even good for any of that!
Luckily, if the next paradigm is brain-like AGI, as I expect, then we can study brains right now, and thus have at least something to go on in understanding the nature of the threat and what to do about it. That’s of course what I’m working on myself.
The obvious, well-known problem with AI-assisted alignment research is the chicken-and-egg problem. Unaligned AIs won’t actually care about robustly solving the alignment problem. So at best, the AIs will care only about impressing us—and we have abundant empirical evidence that people can be impressed by incorrect alignment ideas. At worst, the AIs will be trying to deceive and manipulate us. See further discussion in §4 of my post “Reward Button Alignment”.
But in the context of this post, we have an additional problem on top of that: I expect that, once the next-paradigm AIs are competent enough to meaningfully contribute to alignment research at all, they will be very easily able to invent ASI. Inventing ASI will be (at that point) much, much easier than alignment research—the former will entail just a bit more iteration and putting pieces together (since we’ll already be almost there!), whereas the latter will entail tricky conceptual work, anticipating novel problems, and out-of-the-box thinking.
(I’m talking specifically about getting help from AI-of-the-next-paradigm. A different topic is getting help from LLMs. I’m all for getting help from LLMs where possible! But as I mentioned in §1.4.4 above, I expect that the role of LLMs is, and will continue to be, as a mundane productivity enhancer in the same bucket as Integrated Development Environments (IDEs), PyTorch, arXiv, google, etc.,[18] as opposed to an autonomous researcher akin to humans. I just don’t think they’ll get that good.)
@Joe Carlsmith’s “AI for AI safety” brings up three categories of things to do with AI to make the ASI risk situation better:
- Safety progress: our ability to develop new levels of AI capability safely,
- Risk evaluation: our ability to track and forecast the level of risk that a given sort of AI capability development involves, and
- Capability restraint: our ability to steer and restrain AI capability development when doing so is necessary for maintaining safety.
I don’t really see any of these things working, at least not in the form that Joe and the other people seem to be imagining. Takeoff, sharp as it is, will get very much sharper still if word gets out about how this kind of AI works, and then there’d be no time to get anything done. (And “capability restraint” via governance would be off the table, given how little compute is required, see §1.5–§1.6 above.) Or if things stay mum, then that rules out public risk evaluations, widespread alignment research, or most kinds of AI-assisted societal resilience.
Moreover, the continuous learning nature of the future paradigm (see §1 of “Sharp Left Turn” discourse: An opinionated review) would mean that “AI capabilities” are hard to pin down through capabilities elicitation—the AI might not understand something when you test it, but then later it could figure it out.
(See also §2.6 of the next post on further challenges of weaker AIs supervising stronger AIs.)
Instead, the only forms of “AI for AI safety” that seem plausible to me are much closer to what Eliezer and others were talking about a decade ago: (1) (which, as Scott Alexander points out, will feel quite different if people actually find themselves living inside the scenario that I expect), and (2) very powerful AIs with good motivations, not straightforwardly following human instructions, but rather doing what they think is best. I won’t justify that in detail; it’s out of scope.
An AI with a DSA is one that could unilaterally crush or co-opt all competition, should it choose to. This would constitute a terrifying single-point-of-failure for the whole future. Thus, some people understandably wonder whether we could just, y’know, not have that happen. For example, @Joe Carlsmith’s On “first critical tries” in AI alignment: “I think we should try to make it the case that no AI system is ever in a position to kill everyone and take over the world.”
I’ll leave aside the question of whether DSAs are bad—wait, sorry, they’re definitely bad. But maybe every option is bad, in which case we would have to figure out which option is least bad.[19] Anyway, my goal in this subsection is to argue that, assuming we want to avoid a DSA, I don’t see any way to do that.
A useful notion (after Eliezer via Paul Christiano) is “free energy”, meaning unexploited opportunities that an AI might use to gain power and influence. It includes profitable opportunities that have not yet been taken. It includes chips that have neither been already hacked into, nor secured, nor had their rental price massively bid upwards. It includes brainwashable humans who have neither been already brainwashed, nor been defended against further brainwashing. Things like that.
Free energy depends on competence: the very same environment may have no free energy for a human, nor for a midwit AI, but tons of free energy for a superintelligent AI.
(Free energy also depends on motivation: an opportunity to extort humans by threatening a bioweapon would constitute “free energy” for an AI that doesn’t care about human welfare or norms, but not for an AI that does. But I’ll put that aside—that gets into offense-defense balance and other issues outside the scope of this series.)
Anyway, Paul Christiano suggests that “aligned AI systems can reduce the period of risk of an unaligned AI by … consuming the ‘free energy’ that an unaligned AI might have used to grow explosively.”
Well, my concern is that when this next paradigm goes from “basically useless” to “million super-speed scaled-up telepathic John von Neumanns” in two years, or maybe much less than two years, there’s just an extraordinary amount of free energy appearing on the scene, very fast. It’s like a Mount-Everest-sized pile of gunpowder that will definitely be consumed within a matter of months. It’s pleasant to imagine this happening via a very distributed and controlled gradual burn. But c’mon. There’s gonna be a massive explosion.
Like, suppose I’m wrong about blasting through human level, and instead we get midwit AGIs for five years, and they get deployed in a widespread, distributed way on chips around the world. Does that use up the free energy? No, because the million-John-von-Neumann ASI is still going to come along after that, and wherever it shows up, it can (if it chooses to) crush or outbid all the midwit AGIs, make crazy nanotech stuff, etc.
Ah, but what if there are not two but three steps from world-very-much-like-today to ASI? Midwit AI for a couple years, then genius AI for a couple years, then million-super-speed-John-von-Neumann ASI after that? Then I claim that at least one of those three steps will unlock an extraordinary amount of free energy, enough to easily crush everything that came before and grab unprecedented power. Ah, but what if it’s five steps instead of three? Ditto. The amount of gradualism necessary to fundamentally change this dynamic is far more gradual than I see as plausible. (Again, my central guess is that there will be no deployment at all before ASI.)
Ah, but what if we ban closed-source AI? Nope, I don’t think it helps. For one thing, that will just make takeoff even sharper in wall-clock time. For another thing, I don’t think that’s realistically enforceable, in this context where a small group with a few chips can put the pieces together into a system of vastly greater competence. For yet another thing, I think there are first-mover advantages, and an unstable dynamic in which “power begets power” for these future AIs. For example, the first AI to steal some chips will have extra competence with which to go after more chips—recall the zombie apocalypse movies, where ever more zombies can create ever more zombies. (Except that here, the zombies are superhumanly ambitious, entrepreneurial, patient, etc.) Or they can use the extra compute to self-improve in other ways, or subvert competition.
Ah, but what if some AI safely detonates the free energy by making the world resilient against other powerful AIs—e.g. it autonomously hacks into every data center on Earth, hardens the security (or just destroys the chips!), maybe deploys a “gray goo defense system” or whatever, and then deletes itself? Well, that same AI clearly had a DSA! It’s just that it didn’t use its extraordinary power to install itself as a permanent Singleton—cf. or . By the same token, one could imagine good outcomes like an AI that sets up a “long reflection” and defers to the results, shutting itself down when appropriate. Or an AI could gather power and hand it over to some particular human or institution. Many possibilities. But they still involve some AI having a DSA at some point. So they still involve a giant terrifying single point of failure.
I don’t know when the next paradigm will arrive, and nobody else does either. I tend to say things like “probably 5 to 25 years”. But who knows! For what it’s worth, here are some thoughts related to why I picked those numbers:
For long-timeline readers who think “probably 5-25 years” is too low:
I don’t think 2030 is too soon to strongly rule out ASI. A lot can happen in five years. Five years is how long it took to get from “LLMs don’t even exist at all” in 2018 to GPT-4 in 2023. And that’s an under-estimation of how fast things can move. The path from 2018 to GPT-4 involved a number of bottlenecks that the next paradigm won’t—particularly building huge data centers and training up a huge pool of experts in machine learning, parallelization, hardware acceleration and so on.
If we go a bit further, the entirety of deep learning was a backwater as recently as 2012, a mere 13 years ago.
A different argument goes: “the brain is so ridiculously complicated, and we’re so far from reverse-engineering it, that brain-like AGI could very well take much longer than 25 years”. For my response to that school of thought, see Intro to Brain-Like-AGI Safety §2.8, §3.7, and §3.8. To be clear, it could be more than 25 years. Technological forecasting is very hard. Can’t rule anything out. What do I know?
For short-timeline readers who think “probably 5-25 years” is too high:
I don’t think 2050 is so far away that we can confidently rule out that ASI will take that long. See discussion in §1.4.1 above.
I’m also skeptical that people will get there in under 5 years, just based on my own inside view of where people are at right now and the pace of recent progress. But again, who knows? I don’t rule anything out.
A lot of people seem to believe that either LLMs will scale to AGI within the next couple years, or this whole AGI thing is stupid hype.
That’s just so insane to me. If AGI is 25 years away (for the sake of argument), that still obviously warrants urgent planning right now. People routinely plan that far out in every other domain—climate change, building infrastructure, investing in personal health, saving for retirement, etc.
For example, if AGI is 25 years away, then, in my estimation, I’m much more likely to die from ASI apocalypse than from all other causes combined. And I’m not even that young! This is a real thing coming up, not a far-off abstract fantasy-land scenario.
Other than that, I don’t think it’s terribly decision-relevant whether we get ASI in 5 years versus 25 years, and accordingly I don’t spend much time thinking about it. We should obviously be contingency-planning for both.
Now you know the kind of “foom” I’m expecting: the development of strong superintelligence from a small group working on a new AI paradigm, with essentially no warning and little resources, and leaving us with meagre hope to constrain this radical transition via conventional balance-of-power or governance mechanisms, and very little opportunity to test and iterate on any system remotely similar to the future scary ones.
So we need to be working frantically on technical alignment, sandbox test protocols, and more generally having a plan, right now, long before the future scary paradigm seems obviously on the path to AGI.
(And no, inventing that next AI paradigm is not part of the solution, but rather part of the problem, despite the safety-vibed rhetoric of the researchers who are doing exactly that as we speak—see §1.6.1.)
I am very unhappy to hold that belief, and it’s an unpopular belief in the era of LLMs, but I still think it’s true.
If that’s not bad enough, the next post will argue that, absent some future conceptual breakthrough, this kind of AI will be egregiously misaligned, deceptive, and indifferent to whether its users, programmers, or anyone else lives or dies. Next post: doom!
Thanks Charlie Steiner, Ishaan Koratkar, Seth Herd, and Justis Mills for critical comments on earlier drafts.
For example, (1) On the foom side, Paul Christiano brings up Eliezer Yudkowsky’s past expectation that ASI “would likely emerge from a small group rather than a large industry” as evidence against Eliezer’s judgment and expertise here [disagreement 12] and as “improbable and crazy” here. (2) On the doom side, the “literal genie” / “monkey’s paw” thing, where an AI would follow a specification literally, with catastrophic consequences, as opposed to interpreting natural-language requests with common sense, has likewise largely shifted from a doomer talking point to an anti-doomer mocking point. But I still believe in both those things—see §1.7 and §2.4 respectively.
“LLM” means “Large Language Model”. I’m using it as a synonym for a big class of things, also called “foundation models”, that often include multi-modal capabilities, post-training, tool use, scaffolding, and so on.
For example, this category includes pretty much everyone at OpenAI, Anthropic, DeepMind, OpenPhil, GovAI, CSET, the AISI’s, and on and on.
As another example, I just randomly opened up Alignment Forum, and had to scroll through 20 posts before I found even one that was not related to the alignment properties of today’s LLMs, or otherwise premised on LLMs scaling continuously to ASI.
More broadly, it’s increasingly common in the discourse for people to simply equate “AI” with “LLMs” (as if no other type of AI exists?), and to equate “ASI” with “ASI before 2030 via pure scaling of LLMs” (as if 2040 or 2050 were a distant abstract fantasy-land?). This leads to an endless fountain of bad takes from all sides, which I frequently complain about (1, 2, 3, 4, …).
…in conjunction with the thalamus, basal ganglia, etc.
Someone still needs to do R&D for the hardware side of robotics, but not much! Indeed, teleoperated robots seem to be quite capable and inexpensive already today, despite very low demand.
Could nuclear chain reactions have happened many years earlier? The obvious answer is no: they were bottlenecked by advances in nuclear physics. Ah, but what if we lump together the nuclear chain reactions with all the supporting theory, and ask why that whole package couldn’t have happened many years earlier? But more to the point, if a historical lack of understanding of nuclear physics was a bottleneck delaying nuclear chain reactions, isn’t it likewise possible that a current lack of understanding of [????] is a bottleneck delaying that next AI paradigm today?
The training of GPT-4 used 2e25 FLOP (source: Epoch), and it probably happened mostly during 2022.
I imagine public advocates responding by saying something like:
Well, we could remove LLMs from the narrative, and talk in more general terms about how AGI / ASI is some future technology, to be invented at some future date, and here’s why it’s dangerous and why we should urgently prepare for it right now via safety research, institution building, etc. Indeed, we x-risk people were saying exactly that message 10 years ago, and we were saying it 20 years ago, and we were saying it all the way back to Alan Turing 75 years ago. And nobody gave a shit! The vast majority of people, even AI experts, only started paying the slightest attention to AI x-risk when the message changed to: ‘Y’know, those LLMs, the ones that you can see with your own eyes? We’re talking about those. Or maybe, at most, the next generation of those, which are already being built.’. And that message—man, it’s not even our message! It’s a mutant cousin of our message, which, being far more memetically fit, drowned out our actual more nuanced message in the popular discourse.
And … yeah, sigh, I dunno.
You can’t put nuclear secrets on arXiv, but I find it hard to imagine AI toy model papers ever winding up in that category, even if it were objectively a good idea. See also the time that the USA put export restrictions on an algorithm; not only did the restrictions utterly fail to prevent proliferation, but they were also struck down as unconstitutional!
Other examples of probably-helpful-on-the-margin governance work: (1) it would be nice if governments would publicly announce that AI companies can collaborate for safety reasons without falling afoul of antitrust law; (2) maybe something about liability, e.g. this idea? No strong opinions, I haven’t thought about it much.
Things that qualify as “impressive and proto-AGI-ish” would include helping with AI alignment research, or AI capabilities research, or bioweapons research, or unlocking huge new commercial opportunities, or even just being “visibly intelligent”. LLMs (unlike next-paradigm AIs) are already well into the “impressive and proto-AGI-ish” stage, which by the way is a much lower bar than what Redwood Research people call “transformatively useful AI”.
An important aspect is the question of whether there’s widespread belief that this paradigm is a path to AGI, versus whether it’s just another exploratory subfield of AI. As an analogy, think of probabilistic programming today—it beats a few benchmarks, and it has a few niche commercial applications, and it has some enthusiastic boosters, but mostly nobody cares. (No offense!) My claim is that, very shortly before ASI (in terms of both wall-clock time and R&D effort), the algorithms that will develop into ASI will be similarly niche. That could be true even if the algorithms have some profitable commercial applications in robotics or whatever.
Or I suppose the rate-limiter could be that there are 10,000 “something else”s; but see discussion of “simple(ish) core of intelligence” in §1.3 above.
I’m assuming 100+-fold speedup compared to humans from a mix of serial speedup, parallelization (see discussion of “parallel experiences” here), and various human inefficiencies (relative to our goals with AGI). By the way, I mentioned in §1.5 that I think training-from-scratch will be possible with extraordinarily little compute, like a single consumer GPU—and if a single consumer GPU is really all that a researcher had, then maybe training-from-scratch would take many months. But what I actually expect is that researchers will at least be using ten H100s or whatever for their training runs, which is far more powerful, while still being very inexpensive, widely available, and all-but-impossible to track or govern.
I’m stating a possibility, not saying that I expect people to actually do this. As the doomer refrain goes: “I do not expect us to die with that much dignity.” See also: (which I mostly agree with).
I say “could” instead of “will” because it’s at least conceivable that humans will remain in control and choose to not have AIs work on AI capabilities research.
I expect the future-scary-paradigm AIs to have a pretty obvious (and IMO legitimate) claim to phenomenal consciousness and moral patienthood, much more than LLMs do, thanks to the future scary AIs operating on human-brain-like algorithmic principles. Of course, I don’t know whether future developers will notice or care, and if they do, I don’t know how they’ll act on it. But still, I think the general dismissal of LLM welfare today (pace Anthropic hiring one guy to think about it) is not necessarily indicative of what will happen with the next paradigm.
For the record, a poll of my X followers says that LLMs are a bigger boon to programming than IDEs, although a substantial minority disagreed. Note the obvious caveats that future LLMs will be better than today’s LLMs and that some of my X followers may not be skilled users of LLMs (or IDEs, for that matter).
E.g. Michael Nielsen’s ASI existential risk: reconsidering alignment as a goal emphasizes that multipolar AI scenarios may lead to doom via unsolvable coordination problems related to destructive technologies, related to Vulnerable World Hypothesis. That seems bad! But the DSA thing seems bad too! Again, I’m not taking a stand here, just trying to understand the situation.