Foom & Doom 1: “Brain in a box in a basement”

[-]ryan_greenblatt6mo199

In this comment, I'll try to respond at the object level arguing for why I expect slower takeoff than "brain in a box in a basement". I'd also be down to try to do a dialogue/discussion at some point.

1.4.1 Possible counter: “If a different, much more powerful, AI paradigm existed, then someone would have already found it.”
I think of this as a classic @paulfchristiano-style rebuttal (see e.g. Yudkowsky and Christiano discuss "Takeoff Speeds", 2021).
In terms of reference class forecasting, I concede that it’s rather rare for technologies with extreme profit potential to have sudden breakthroughs unlocking massive new capabilities (see here), that “could have happened” many years earlier but didn’t. But there are at least a few examples, like the 2025 baseball “torpedo bat”, wheels on suitcases, the original Bitcoin, and (arguably) nuclear chain reactions.^[7]

I think the way you describe this argument isn't quite right. (More precisely, I think the argument you give may also be a (weaker) counterargument that people sometimes say, but I think there is a nearby argument which is much stronger.)

Here's how I would put this:

Prior to having a complete version of this much more powerful AI paradigm, you'll first have a weaker version of this paradigm (e.g. you haven't figured out the most efficient way to do the brain algorithmic etc). Further, the weaker version of this paradigm might initially be used in combination with LLMs (or other techniques) such that it (somewhat continuously) integrates into the old trends. Of course, large paradigm shifts might cause things to proceed substantially faster or bend the trend, but not necessarily.

Further, we should still broadly expect this new paradigm will itself take a reasonable amount of time to transition through the human range and though different levels of usefulness even if it's very different from LLM-like approaches (or other AI tech). And we should expect this probably happens at massive computational scale where it will first be viable given some level of algorithmic progress (though this depends on the relative difficulty of scaling things up versus improving the algorithms). As in, more than a year prior to the point where you can train a superintelligence on a gaming GPU, I expect someone will train a system which can automate big chunks of AI R&D using a much bigger cluster.

On this prior point, it's worth noting that of the Paul's original points in Takeoff Speeds are totally applicable to non-LLM paradigms as is much in Yudkowsky and Christiano discuss "Takeoff Speeds". (And I don't think you compellingly respond to these arguments.)

I think your response is that you argue against these perspectives under 'Very little R&D separating “seemingly irrelevant” from ASI'. But, I just don't find these specific arguments very compelling. (Maybe also you'd say that you're just trying to lay out your views rather than compellingly arguing for them. Or maybe you'd say that you can't argue for your views due to infohazard/forkhazard concerns. In which case, fair enough.) Going through each of these:

I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,^[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains). How much is “very little”? I dunno, maybe 0–30 person-years of R&D? Contrast that with AI-2027’s estimate that crossing that gap will take millions of person-years of R&D.
Why am I expecting this? I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.

I don't buy that having a "simple(ish) core of intelligence" means that you don't take a long time to get the resulting algorithms. I'd say that much of modern LLMs does have a simple core and you could transmit this using a short 30 page guide, but nonetheless, it took many years of R&D to reach where we are now. Also, I'd note that the brain seems way more complex than LLMs to me!

For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence

My main response would be that basically all paradigms allow for mixing imitation with reinforcement learning. And, it might be possible to mix the new paradigm with LLMs which would smooth out / slow down takeoff.

You note that imitation learning is possible for brains, but don't explain why we won't be able to mix the brain like paradigm with more imitation than human brains do which would smooth out takeoff. As in, yes human brains doesn't use as much imitation as LLMs, but they would probably perform better if you modified the algorthm some and did do 10^26 FLOP worth of imitation on the best data. This would smooth out the takeoff.

Why do I think getting to “relevant at all” takes most of the work? This comes down to a key disanalogy between LLMs and brain-like AGI, one which I’ll discuss much more in the next post.

I'll consider responding to this in a comment responding to the next post.

Edit: it looks like this is just the argument that LLM capabilities come from imitation due to transforming observations into behavior in a way humans don't. I basically just think that you could also leverage imitation more effectively to get performance earlier (and thus at a lower level) with an early version of more brain like architecture and I expect people would do this in practice to see earlier returns (even if the brain doesn't do this).

Instead of imitation learning, a better analogy is to AlphaZero, in that the model starts from scratch and has to laboriously work its way up to human-level understanding.

Noteably, in the domains of chess and go it actually took many years to make it through the human range. And, it was possible to leverage imitation learning and human heuristics to perform quite well at Go (and chess) in practice, up to systems which weren't that much worse than humans.

it takes a lot of work to get AlphaZero to the level of a skilled human, but then takes very little extra work to make it strongly superhuman.

AlphaZero exhibits returns which are maybe like 2-4 SD (within the human distribution of Go players supposing ~100k to 1 million Go players) per 10x-ing of compute.^[1] So, I'd say it probably would take around 30x to 300x additional compute to go from skilled human (perhaps 2 SD above median) to strongly superhuman (perhaps 3 SD above the best human or 7.5 SD above median) if you properly adapted to each compute level. In some ways 30x to 300x is very small, but also 30x to 300x is not that small...

In practice, I expect returns more like 1.2 SD / 10x of compute at the point when AIs are matching top humans. (I explain this in a future post.)

1.7.2 “Plenty of room at the top”

I agree with this.

1.7.3 What’s the rate-limiter?

[...]
My rebuttal is: for a smooth-takeoff view, there has to be some correspondingly-slow-to-remove bottleneck that limits the rate of progress. In other words, you can say “If Ingredient X is an easy huge source of AGI competence, then it won’t be the rate-limiter, instead something else will be”. But you can’t say that about every ingredient! There has to be a “something else” which is an actual rate-limiter, that doesn’t prevent the paradigm from doing impressive things clearly on track towards AGI, but that does prevent it from being ASI, even after hundreds of person-years of experimentation.^[13] And I’m just not seeing what that could be.
Another point is: once people basically understand how the human brain figures things out in broad outline, there will be a “neuroscience overhang” of 100,000 papers about how the brain works in excruciating detail, and (I claim) it will rapidly become straightforward to understand and integrate all the little tricks that the brain uses into AI, if people get stuck on anything.

I'd say that the rate limiter is that it will take a while to transition from something like "1000x less compute efficient than the human brain (as in, it will take 1000x more compute than human lifetime to match top human experts but simultaneously the AIs will be better at a bunch of specific tasks)" to "as compute efficient as the human brain". Like, the actual algorithmic progress for this will take a while and I don't buy your claim that that way this will work is that you'll go from nothing to having an outline of how the brain works and at this point everything will immediately come together due to the neuroscience literature. Like, I think something like this is possible, but unlikely (especially prior to having AIs that can automate AI R&D).

And, while you have much less efficient algorithms, you're reasonably likely to get bottlenecked on either how fast you can scale up compute (though this is still pretty fast, especially if all those big datacenters for training LLMs are still just lying around around!) or how fast humanity can produce more compute (which can be much slower).

Part of my disagreement is that I don't put the majority of the probability on "brain-like AGI" (even if we condition on something very different from LLMs) but this doesn't explain all of the disagreement.

^{^}
It looks like a version of AlphaGo Zero goes from 2400 ELO (around 1000th best human) to 4000 ELO (somewhat better than the best human) between hours 15 to 40 of training run (see Figure 3 in this PDF). So, naively this is a bit less than 3x compute for maybe 1.9 SDs (supposing that the “field” of Go players has around 100k to 1 million players) implying that 10x compute would get you closer to 4 SDs. However, in practice, progress around the human range was slower than 4 SDs/OOM would predict. Also, comparing times to reach particular performances within a training run can sometimes make progress look misleadingly fast due to LR decay and suboptimal model size. The final version of AlphaGo Zero used a bigger model size and ran RL for much longer, and it seemingly took more compute to reach the ~2400 ELO and ~4000 ELO which is some evidence for optimal model size making a substantial difference (see Figure 6 in the PDF). Also, my guess based on circumstantial evidence is that the original version of AlphaGo (which was initialized with imitation) moved through the human range substantially slower than 4 SDs/OOMs. Perhaps someone can confirm this. (This footnote is copied from a forthcoming post of mine.)

[-]Lukas Finnveden5mo50

Prior to having a complete version of this much more powerful AI paradigm, you'll first have a weaker version of this paradigm (e.g. you haven't figured out the most efficient way to do the brain algorithmic etc).

A supporting argument: Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn't expect the human brain algorithm to be an all-or-nothing affair. (Unless it's so simple that evolution could find it in ~one step, but that seems implausible.)

Edit: Though in principle, there could still be a heavy-tailed distribution of how useful each innovation is, with one innovation producing most of the total value. (Even though the steps leading up to that were individually slightly useful.) So this is not a knock-down argument.

[-]Steven Byrnes5mo20

My claim was “I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,^[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains).”

I don’t think anything about human brains and their evolution cuts against this claim.

If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said. There are lots of parts of the human brain that are doing essential-for-AGI stuff, but if they’re not in place, then you also fail to pass the earlier threshold of “impressive and proto-AGI-ish”, e.g. by doing things that LLMs (and other existing techniques) cannot already do.

Or maybe your argument is “brain-like AGI will involve lots of useful components, and we can graft those components onto LLMs”? If so, I’m skeptical. I think the cortex is the secret sauce, and the other components are either irrelevant for LLMs, or things that LLM capabilities researchers already know about. For example, the brain has negative feedback loops, and the brain has TD learning, and the brain has supervised learning and self-supervised learning, etc., but LLM capabilities researchers already know about all those things, and are already using them to the extent that they are useful.

[-]Lukas Finnveden5mo30

To be clear: I'm not sure that my "supporting argument" above addressed an objection to Ryan that you had. It's plausible that your objections were elsewhere.

But I'll respond with my view.

If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said.

Ok, so this describes a story where there's a lot of work to get proto-AGI and then not very much work to get superintelligence from there. But I don't understand what's the argument for thinking this is the case vs. thinking that there's a lot of work to get proto-AGI and then also a lot of work to get superintelligence from there.

Going through your arguments in section 1.7:

"I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above."
- But I think what you wrote about the simple(ish) core of intelligence in 1.3 is compatible with there being like (making up a number) 20 different innovations involved in how the brain operates, each of which gets you a somewhat smarter AI, each of which could be individually difficult to figure out. So maybe you get a few, you have proto-AGI, and then it takes a lot of work to get the rest.
  - Certainly the genome is large enough to fit 20 things.
  - I'm not sure if the "6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on" is complex enough to encompass 20 different innovations. Certainly seems like it should be complex enough to encompass 6.
- (My argument above was that we shouldn't expect the brain to run an algorithm that only is useful once you have 20 hypothetical components in place, and does nothing beforehand. Because it was found via local search, so each of the 20 things should be useful on their own.)
"Plenty of room at the top" — I agree.
"What's the rate limiter?" — The rate limiter would be to come up with the thinking and experimenting needed to find the hypothesized 20 different innovations mentioned above. (What would you get if you only had some of the innovations? Maybe AGI that's incredibly expensive. Or AGIs similarly capable as unskilled humans.)
"For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence"
- I agree that there are reasons to expect imitation learning to plateau around human-level that don't apply to fully non-imitation learning.
- That said...
  - For some of the same reasons that "imitation learning" plateaus around human level, you might also expect "the thing that humans do when they learn from other humans" (whether you want to call that "imitation learning" or "predictive learning" or something else) to slow down skill-acquisition around human level.
  - There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).

[-]Donald Hobson4mo10

Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn't expect the human brain algorithm to be an all-or-nothing affair.

If humans are looking at parts of the human brain, and copying it, then it's quite possible that the last component we look at is the critical piece that nothing else works without. A modern steam engine was developed step by step from simpler and cruder machines. But if you take apart a modern steam engine, and copy each piece, it's likely that it won't work at all until you add the final piece, depending on the order you recreate pieces in.

It's also possible that rat brains have all the fundamental insights. To get from rats to humans, evolution needed to produce lots of genetic code that grew extra blood vessels to supply the oxygen and that prevented brain cancer. (Also, evolution needed to spend time on alignment) A human researcher can just change one number, and maybe buy some more GPU's.

[-]Steven Byrnes6mo20

Thanks! Here’s a partial response, as I mull it over.

Also, I'd note that the brain seems way more complex than LLMs to me!

See “Brain complexity is easy to overstate” section here.

basically all paradigms allow for mixing imitation with reinforcement learning

As in the §2.3.2, if an LLM sees output X in context Y during pretraining, it will automatically start outputting X in context Y. Whereas if smart human Alice hears Bob say X in context Y, Alice will not necessarily start saying X in context Y. Instead she might say “Huh? Wtf are you talking about Bob?”

Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?

(If Alice is able says to herself “in this situation, Bob would say X”, then she has a shoulder-Bob, and that’s definitely a benefit not a cost. But that’s predictive learning, not imitative learning. No question that predictive learning is helpful. That’s not what I’m talking about.)

…So there’s my intuitive argument that the next paradigm would be hindered rather than helped by mixing in some imitative learning. (Or I guess more precisely, as long as imitative learning is part of the mix, I expect the result to be no better than LLMs, and probably worse. And as long as we’re in “no better than LLM” territory, I’m off the hook, because I’m only making a claim that there will be little R&D between “doing impressive things that LLMs can’t do” and ASI, not between zero and ASI.)

Noteably, in the domains of chess and go it actually took many years to make it through the human range. And, it was possible to leverage imitation learning and human heuristics to perform quite well at Go (and chess) in practice, up to systems which weren't that much worse than humans.

In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).

In particular, LLMs today are in many (not all!) respects “in the human range” and “perform quite well” and “aren’t that much worse than humans”.

algorithmic progress

I started writing a reply to this part … but first I’m actually kinda curious what “algorithmic progress” has looked like for LLMs, concretely—I mean, the part where people can now get the same results from less compute. Like what are the specific things that people are doing differently today than in 2019? Is there a list somewhere? A paper I could read? (Or is it all proprietary?) (Epoch talks about how much improvement has happened, but not what the improvement consists of.) Thanks in advance.

[-]ryan_greenblatt6mo*24

See “Brain complexity is easy to overstate” section here.

Sure, but I still think it's probably more way more complex than LLMs even if we're just looking at the parts key for AGI performance (in particular, the parts which learn from scratch). And, my guess would be that performance is ~~substantially~~ greatly degraded if you only take only as much complexity as the core LLM learning algorithm.

Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?

This isn't really what I'm imagining, nor do I think this is how LLMs work in many cases. In particular, LLMs can transfer from training on random github repos to being better in all kinds of different contexts. I think humans can do something similar, but have much worse memory.

I think in the case of humans and LLMs, this is substantially subconcious/non-explicit, so I don't think this is well described as having a shoulder Bob.

Also, I would say that humans do learn from imitation! (You can call it prediction, but it doesn't matter what you call it as long as it implies that data from humans makes things scale more continuously through the human ragne.) I just think that you can do better at this than humans based on the LLM case, mostly because humans aren't exposed to as much data.

Also, I think the question is "can you somehow make use of imitation data" not "can the brain learning algorithm immediately use of imitation"?

In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).

Notably this analogy implies LLMs will be able to automate substantial fractions of human work prior to a new paradigm which (over the course of a year or two and using vast computational resources) beats the best humans. This is very different from the "brain in a basement" model IMO. I get that you think the analogy is imperfect (and I agree), but it seems worth noting that the analogy you're drawing suggests something very different from what you expect to happen.

Is there a list somewhere? A paper I could read? (Or is it all proprietary?)

It's substantially proprietary, but you could consider looking at the Deepseek V3 paper. We don't actually have great understanding of the quantity and nature of algorithmic improvment after GPT-3. It would be useful for someone to do a more up to date review based on the best available evidence.

[-]Kaj_Sotala6mo1618

My thoughts on reading this post and your second one:

"Oh. Steven is just obviously correct."
"I somehow allowed myself to be lulled into a false sense of safety with the way LLMs are. Fuck."
"How did I need this post to see this? It's so clearly and straightforwardly correct, just like one inference step away from everything I already knew, that my mind must have been carefully looking away from this but now can't rationalize it away once it has been pointed out. Fuck."
"Fuck."

[-]Raemon6mo10

I'm curious what's the argument that felt most like "oh"

[-]Kaj_Sotala6mo10

Maybe something like "non-LLM AGIs are a thing too and we know from the human brain that they're going to be much more data-efficient than LLM ones"; it feels like the focus in conversation has been so strongly on LLM-descended AGIs that I just stopped thinking about that.

[-]habryka5mo1120

Promoted to curated: I think this post is good, as is the next post in the sequence. It made me re-evaluate some of the strategic landscape, and is also otherwise just very clear and structured in how it approaches things.

Thanks a lot for writing it!

[-]ryan_greenblatt6mo*107

But I’m in much closer agreement with that scenario than the vast majority of AI safety & alignment researchers today, who tend to see the “foom & doom” scenario above as somewhere between “extraordinarily unlikely” and “already falsified”!
Those researchers are not asking each other “is it true?”, but rather “lol, can you believe that some people used to believe that?”.^[1] Oh well. Laugh all you want. It’s still what I believe.

To clarify my views:

I think it's very unlikely (maybe 3%) that a small team with fewer computational resources than 32 H100 equivalents builds a system which rockets from unimpressive to ASI in <2 weeks (prior to some other larger and better resourced group creating powerful AI and conditional on not being in a regime where other groups are deliberately not advancing capabilities for a sustained period, e.g. due to governance).
- I don't think it's "already falsified", but I do think we've gotten evidence against this perspective. In particular, this perspective makes ~no prediction about economic impact of earlier AI systems or investment (and at least Eliezer was predicting we wouldn't see earlier economic effects) while an alternative more continuous / slower takeoff prediction does make predictions about massive investment. We've seen massive investment, so we should update some toward the slower takeoff perspective. This isn't a huge update (I think something like 2:1), so if you were very confident, it doesn't make much difference.
- My view of "3%" is roughly my current inside view, but I don't think this is very reflectively stable. I think if I was forecasting, I'd probably go a touch higher due to some deference to people who think this is more likely.
I think it's plausible but unlikely that sudden large paradigm shifts or sudden large chunks of algorithmic progress happen and cause a huge jump in capabilities (by this, I mean something which is at least as big as 2 years of overall AI progress which is perhaps 400x effective compute, though this might not be meaningful due to qualitative limits on current approaches). Perhaps I think this is 10% likely prior to AIs which can fully automate AI R&D and about 20% likely at any point prior to crazy ASI.
- This is made more plausible by higher compute availability, by more research on AI, and by substantial AI automation of AI R&D.
- I tend to think this is somewhat underrated among people working in AI safety
It seems plausible but unlikely that takeoff is very fast because at the point of AGI, the returns to further compute / algorithmic progress are much higher than in the past. In particular, I think currently we see something like 1 SD (Standard Deviation) of human equivalent performance per 10x increase of effective compute in LLMs (I have an unreleased post discussing this in more detail which I plan on posting soon) and I can easily imagine this increasing to more like 4-6 SD / 10x such that you blow through the human range quite quickly. (Though more like months or maybe weeks than days.) Scaling up the human brain by 10x (post-adaptation and resolving issues that might show up) would probably be something like +4 SD of IQ from my understanding.
- Edit: my post discussing what I expect we see per 10x increase in effective compute is now up: What does 10x-ing effective compute get you?
I think the event "what happened is that LLMs basically scaled to AGI (or really full automation of AI R&D) and were the key paradigm (including things like doing RL on agentic tasks with an LLM initialization and a deep learning based paradigm)" is maybe like 65% likely conditional on AGI before 2035. (The event "ASI was made basically by scaling LLMs" is probably much less likely (idk 30%), but I feel even more confused by this.)
- This view isn't very well considered and it's plausible I substantially update on further reflection, but it's hard for me to imagine going above 90% or below 30%.
I think of most of my work as not making strong assumptions about the paradigm, except that it assumes AIs are trainable and I'm assuming relatively slower takeoff.

These posts are mainly exploring my disagreement with a group of researchers who think of LLMs^[2] as being on a smooth, continuous path towards ASI. This group comprises probably >95% of people working on AI alignment, safety, and governance today^[3].
(For many people in this group, if you ask them directly whether there might be important changes in AI algorithms, training approaches, etc., between today and ASI, they’ll say “Oh yes, of course that’s possible”. But if you ask them any other question about the future of AI, they’ll answer as if they expect no such change.)
There’s a very short answer to why I disagree with those LLM-focused researchers on foom & doom: They expect LLMs to scale to ASI, and I don’t.

As noted above, I don't feel particularly strongly that LLMs will scale to ASI and this isn't a very load bearing part of my perspective.

Further, I don't think my views about continuity and slower takeoff (more like 6 months to a few years depending on what you're counting, but also with some probability on more like a decade) are that strongly driven by putting a bunch of probability on LLMs scaling to AGI / full automation of AI R&D. It's based on:

Specific observations about the LLM and ML paradigm, both because something close to this is a plausible paradigm for AGI and because it updates us about rates we'd expect in future paradigms.
Views that compute is likely to be a key driver of progress and that things will first be achieved at a high level of compute. (Due to mix of updates from LLMs/ML and also from general prior views.)
Views about how technology progress generally works as also applied to AI. E.g., you tend to get a shitty version of things before you get the good version of things which makes progress more continuous.

[-]ryan_greenblatt6mo72

LLM-focused AGI person: “Ah, that’s true today, but eventually other AIs can do this ‘development and integration’ R&D work for us! No human labor need be involved!”
Me: “No! That’s still not radical enough! In the future, that kind of ‘development and integration’ R&D work just won’t need to be done at all—not by humans, not by AIs, not by anyone! Consider that there are 8 billion copies of basically one human brain design, and if a copy wants to do industrial design, it can just figure it out. By the same token, there can be basically one future AGI design, and if a copy wants to do industrial design, it can just figure it out!”

I think the LLM-focused AGI people broadly agree with what you're saying and don't see a real disagreement here. I don't see an important distinction between "AIs can figure out development and integration R&D" and "AIs can just learn the relevant skills". Like, the AIs are doing some process which results in a resulting AI that can perform the relevant task. This could be an AI updated by some generic continual learning algorithm or an AI which is trained on a bunch of RL environment that AIs create, it doesn't ultimately make much of a difference so long as it works quickly and cheaply. (There might be a disagreement in what sample efficiency (as in, how efficiently AIs can learn from limited data) people are expecting AIs to have at different levels of automation.)

Similarly, note that humans also need to do things like "figure out how to learn some skill" or "go to school". Similarly, AIs might need to design a training strategy for themselves (if existing human training programs don't work or would be too slow), but it doesn't really matter.

[-]Steven Byrnes6mo61

Thanks! I suppose I didn’t describe it precisely, but I do think I’m pointing to a real difference in perspective, because if you ask this “LLM-focused AGI person” what exactly the R&D work entails, they’ll almost always describe something wildly different from what a human skill acquisition process would look like. (At least for the things I’ve read and people I’ve talked to; maybe that doesn’t generalize though?)

For example, if the task is “the AI needs to run a restaurant”, I’d expect the “LLM-focused AGI person” to talk about an R&D project that involves sourcing a giant set of emails and files from lots of humans who have successfully run restaurants, and fine-tuning the AI on that data; and/or maybe creating a “Sim Restaurant” RL training environment; or things like that. I.e., lots of things that no human restaurant owner has ever done.

This is relevant because succeeding at this kind of R&D task (e.g. gathering that training data) is often not quick, and/or not cheap, and/or not even possible (e.g. if the appropriate training data doesn’t exist).

(I agree that if we assert that the R&D is definitely always quick and cheap and possible, at least comparable to how quick and cheap and possible is (sped-up super-) human skill acquisition, then the precise nature of the R&D doesn’t matter much for takeoff questions.)

(Separately, I think talking about “sample efficiency” is often misleading. Humans often do things that have never been done before. That’s zero samples, right? What does sample efficiency even mean in that case?)

[-]ryan_greenblatt6mo23

I agree there is a real difference, I just expect it to not make much of a difference to the bottom line in takeoff speeds etc. (I also expect some of both in the short timelines LLM perspective at the point of full AI R&D automation.)

fMy view is that on hard tasks humans would also benefit from stuff like building explicit training data for themselves, especially if they had the advantage of "learn once, deploy many". I think humans tend to underinvest in this sort of thing.

In the case of things like restaurant sim, the task is sufficiently easy that I expect AGI would probably not need this sort of thing (though it might still improve performance enough to be worth it).

I expect that as AIs get smarter (perhaps beyond the AGI level) they will be able to match humans at everything without needing to do explicit R&D style learning in cases where humans don't need this. But, this sort of learning might still be sufficiently helpful that AIs are ongoingly applying it in all domains where increased cognitive performance has substantial returns.

(Separately, I think talking about “sample efficiency” is often misleading. Humans often do things that have never been done before. That’s zero samples, right? What does sample efficiency even mean in that case?)

Sure, but we can still loosely evaluate sample efficiency relative to humans in cases where some learning (potentially including stuff like learning on the job). As in, how well can the AI learn from some some data relative to humans. I agree that if humans aren't using learning in some task then this isn't meaningful (and this distinction between learning and other cognitive abilities is itself a fuzzy distinction).

[-]Thane Ruthenis6mo65

Fully agree with everything in this post, this is exactly my model as well. (That's the reason behind my last-line rug-pull here, by the way.)

[-]ryan_greenblatt6mo41

On the foom side, Paul Christiano brings up Eliezer Yudkowsky’s past expectation that ASI “would likely emerge from a small group rather than a large industry” as a failed prediction here [disagreement 12] and as “improbable and crazy” here.

Actually, I don't think Paul says this is a failed prediction in the linked text. He says:

The Eliezer predictions most relevant to “how do scientific disciplines work” that I’m most aware of are incorrectly predicting that physicists would be wrong about the existence of the Higgs boson (LW bet registry) and expressing the view that real AI would likely emerge from a small group rather than a large industry (pg 436 but expressed many places).

My understanding is that this is supposed to be read as "[incorrectly predicting that physicists would be wrong about the existence of the Higgs boson (LW bet registry)] and [expressing the view that real AI would likely emerge from a small group rather than a large industry]", Paul isn't claiming that the view that real AI would likely emerge from a small group is a failed prediction!

On "improbable and crazy", Paul says:

The debate was about whether a small group could quickly explode to take over the world. AI development projects are now billion-dollar affairs and continuing to grow quickly, important results are increasingly driven by giant projects, and 9 people taking over the world with AI looks if anything even more improbable and crazy than it did then. Now we're mostly talking about whether a $10 trillion company can explosively grow to $300 trillion as it develops AI, which is just not the same game in any qualitative sense. I'm not sure Eliezer has many precise predictions he'd stand behind here (setting aside the insane pre-2002 predictions), so it's not clear we can evaluate his track record, but I think they'd look bad if he'd made them.

Note that Paul says "looks if anything even more improbable and crazy than it did then". I think your quotation is reasonable, but it's unclear if Paul thinks this is "crazy" or if he thinks it's just more incorrect and crazy-looking than it was in the past.

[-]Steven Byrnes6mo40

I just reworded from “as a failed prediction” to “as evidence against Eliezer’s judgment and expertise”. I agree that the former was not a good summary, but am confident that the latter is what Paul intended to convey and expected his readers to understand, based on the context of disagreement 12 (which you quoted part but not all of). Sorry, thanks for checking.

[-]Chapin Lenthall-Cleary5mo32

LLMs are very impressive, but they’re not AGI yet—not by my definition. For example, existing AIs are nowhere near capable of autonomously writing a business plan and then founding a company and growing it to $1B/year revenue, all with zero human intervention. By analogy, if humans were like current AIs, then humans would be able to do some narrow bits of founding and running companies by ourselves, but we would need some intelligent non-human entity (angels?) to repeatedly intervene, assign tasks to us humans, and keep the larger project on track.

This is an insane AGI definition/standard. Very few humans can make billion-dollar businesses, and the few who can take years to do so. If that were the requirement for AGI, almost all humans wouldn't qualify. Indeed, if an AI could make a billion-dollar-a-year businesses upon demand, I'd wonder whether it was (weak) ASI.

(Not saying that current systems qualify as AGI, though I would say they're quite close to what I'd call weak AGI. They do indeed have severe issues with time horizons and long-term planning. But a reasonable AGI definition shouldn't exclude the vast majority of humans.)

[-]Steven Byrnes5mo*41

Sorry if it’s unclear (I’m open to rewording), but my intention was that the link in the first sentence was my (loose) definition of AGI, and the following sentences were not a definition but rather an example of something that AI cannot do yet.

I deliberately chose an example where it’s just super duper obvious that we’re not even remotely close to AI succeeding at the task, because I find there are lots of LLM-focused people who have a giant blind spot: They read the questions on Humanity’s Last Exam or whatever, and scratch their head and say “C’mon, when future LLMs saturate the HLE benchmark, what else is there? Look how hard those questions are! They’re PhD level in everything! If that’s not superintelligence, what is?” …And then my example (autonomously founding a company and growing it to $1B/year revenue over the course of years) is supposed to jolt those people into saying “ohhh, right, there’s still a TON of headroom above current AI”.

[-]Rafael Harth6mo*20

Oh no, I didn't realize your perspective was this gloomy. But it makes a lot of sense. Actually it mostly comes down to, you can just dispute the consensus^[1] that the classically popular Yudkowskyian/Bostromian views have been falsified by the rise of LLMs. If they haven't, then fast takeoff now is plausible for mostly the same reasons that we used to think it's plausible.

I think the path from here to AGI is bottlenecked by researchers playing with toy models, and publishing stuff on arXiv and GitHub.

I think there is some merit to just asking these people to do something else. Maybe not a lot of merit, but a little more than zero, at least for some of them. Especially if they are on this site. Not with a tweet, but by using your platform here. (Plausibly you have already considered this and have good reasons for why it's a terrible idea, but it felt worth suggesting.)

I'm not sure if this is in fact a consensus, but it sure feels that way ↩︎

[-]Donald Hobson4mo10

One thing I disagree with is the idea that there is only one "next paradigm AI" with specific properties.

I think there are a wide spectrum of next paradigm AI's, some safer than others. Brain like AI's are just one option out of a large possibility space.

And if the AI is really brainlike, that suggests making an AI that's altruistic for the same reason some humans are. Making a bunch of IQ 160, 95th percentile kindness humans, and basically handing the world over to them sounds like a pretty decent plan.

[-]Donald Hobson4mo10

But they still involve some AI having a DSA at some point. So they still involve a giant terrifying single point of failure.

A single point of failure also means a single point of success.

It could be much worse. We could have 100s of points of failure, and if anything goes wrong at any of those points, we are doomed.

[-]Donald Hobson4mo10

It includes chips that have neither been already hacked into, nor secured, nor had their rental price massively bid upwards. It includes brainwashable humans who have neither been already brainwashed, nor been defended against further brainwashing.

We are already seeing problems with ChatGPT induced psychosis. And seeing LLM's that kinda hack a bit.

What does the world look like if it is saturated with moderately competent hacking/phishing/brainwashing LLM's? Yes, a total mess. But a mess with less free energy perhaps? Especially if humans have developed some better defenses. Probably still a lot of free energy, but less.

[-]Donald Hobson4mo10

These posts are mainly exploring my disagreement with a group of researchers who think of LLMs^[2] as being on a smooth, continuous path towards ASI.

I'm not sure exactly what it means for LLM's to be on a "continuous path towards ASI".

I'm pretty sure that LLM's aren't the pinnacle of possible mind design.

So the question is, will better architectures be invented by a human or an LLM, and how scaled will the LLM be when this happens.

[-]Steven Byrnes4mo20

I talk about that a bit in §1.4.4.

^{^}

For example, (1) On the foom side, Paul Christiano brings up Eliezer Yudkowsky’s past expectation that ASI “would likely emerge from a small group rather than a large industry” as evidence against Eliezer’s judgment and expertise here [disagreement 12] and as “improbable and crazy” here. (2) On the doom side, the “literal genie” / “monkey’s paw” thing, where an AI would follow a specification literally, with catastrophic consequences, as opposed to interpreting natural-language requests with common sense, has likewise largely shifted from a doomer talking point to an anti-doomer mocking point. But I still believe in both those things—see §1.7 and §2.4 respectively.

^{^}

“LLM” means “Large Language Model”. I’m using it as a synonym for a big class of things, also called “foundation models”, that often include multi-modal capabilities, post-training, tool use, scaffolding, and so on.

^{^}

For example, this category includes pretty much everyone at OpenAI, Anthropic, DeepMind, OpenPhil, GovAI, CSET, the AISI’s, and on and on.

As another example, I just randomly opened up Alignment Forum, and had to scroll through 20 posts before I found even one that was not related to the alignment properties of today’s LLMs, or otherwise premised on LLMs scaling continuously to ASI.

More broadly, it’s increasingly common in the discourse for people to simply equate “AI” with “LLMs” (as if no other type of AI exists?), and to equate “ASI” with “ASI before 2030 via pure scaling of LLMs” (as if 2040 or 2050 were a distant abstract fantasy-land?). This leads to an endless fountain of bad takes from all sides, which I frequently complain about (1, 2, 3, 4, …).

^{^}

We have a mere 25,000 genes, and moreover they are tasked with storing not only the full design of our brain algorithm, but also the designs for the nuclear pore complex and hair follicles and everything else. More discussion at 1, 2, 3.

^{^}

…in conjunction with the thalamus, basal ganglia, etc.

^{^}

Someone still needs to do R&D for the hardware side of robotics, but not much! Indeed, teleoperated robots seem to be quite capable and inexpensive already today, despite very low demand.

^{^}

Could nuclear chain reactions have happened many years earlier? The obvious answer is no: they were bottlenecked by advances in nuclear physics. Ah, but what if we lump together the nuclear chain reactions with all the supporting theory, and ask why that whole package couldn’t have happened many years earlier? But more to the point, if a historical lack of understanding of nuclear physics was a bottleneck delaying nuclear chain reactions, isn’t it likewise possible that a current lack of understanding of [????] is a bottleneck delaying that next AI paradigm today?

^{^}

The training of GPT-4 used 2e25 FLOP (source: Epoch), and it probably happened mostly during 2022.

^{^}

I imagine public advocates responding by saying something like:

Well, we could remove LLMs from the narrative, and talk in more general terms about how AGI / ASI is some future technology, to be invented at some future date, and here’s why it’s dangerous and why we should urgently prepare for it right now via safety research, institution building, etc. Indeed, we x-risk people were saying exactly that message 10 years ago, and we were saying it 20 years ago, and we were saying it all the way back to Alan Turing 75 years ago. And nobody gave a shit! The vast majority of people, even AI experts, only started paying the slightest attention to AI x-risk when the message changed to: ‘Y’know, those LLMs, the ones that you can see with your own eyes? We’re talking about those. Or maybe, at most, the next generation of those, which are already being built.’. And that message—man, it’s not even our message! It’s a mutant cousin of our message, which, being far more memetically fit, drowned out our actual more nuanced message in the popular discourse.

And … yeah, sigh, I dunno.

^{^}

You can’t put nuclear secrets on arXiv, but I find it hard to imagine AI toy model papers ever winding up in that category, even if it were objectively a good idea. See also the time that the USA put export restrictions on an algorithm; not only did the restrictions utterly fail to prevent proliferation, but they were also struck down as unconstitutional!

^{^}

Other examples of probably-helpful-on-the-margin governance work: (1) it would be nice if governments would publicly announce that AI companies can collaborate for safety reasons without falling afoul of antitrust law; (2) maybe something about liability, e.g. this idea? No strong opinions, I haven’t thought about it much.

^{^}

Things that qualify as “impressive and proto-AGI-ish” would include helping with AI alignment research, or AI capabilities research, or bioweapons research, or unlocking huge new commercial opportunities, or even just being “visibly intelligent”. LLMs (unlike next-paradigm AIs) are already well into the “impressive and proto-AGI-ish” stage, which by the way is a much lower bar than what Redwood Research people call “transformatively useful AI”.

An important aspect is the question of whether there’s widespread belief that this paradigm is a path to AGI, versus whether it’s just another exploratory subfield of AI. As an analogy, think of probabilistic programming today—it beats a few benchmarks, and it has a few niche commercial applications, and it has some enthusiastic boosters, but mostly nobody cares. (No offense!) My claim is that, very shortly before ASI (in terms of both wall-clock time and R&D effort), the algorithms that will develop into ASI will be similarly niche. That could be true even if the algorithms have some profitable commercial applications in robotics or whatever.

^{^}

Or I suppose the rate-limiter could be that there are 10,000 “something else”s; but see discussion of “simple(ish) core of intelligence” in §1.3 above.

^{^}

I’m assuming 100+-fold speedup compared to humans from a mix of serial speedup, parallelization (see discussion of “parallel experiences” here), and various human inefficiencies (relative to our goals with AGI). By the way, I mentioned in §1.5 that I think training-from-scratch will be possible with extraordinarily little compute, like a single consumer GPU—and if a single consumer GPU is really all that a researcher had, then maybe training-from-scratch would take many months. But what I actually expect is that researchers will at least be using ten H100s or whatever for their training runs, which is far more powerful, while still being very inexpensive, widely available, and all-but-impossible to track or govern.

^{^}

I’m stating a possibility, not saying that I expect people to actually do this. As the doomer refrain goes: “I do not expect us to die with that much dignity.” See also: “Aligning an AGI adds significant development time” (which I mostly agree with).

^{^}

I say “could” instead of “will” because it’s at least conceivable that humans will remain in control and choose to not have AIs work on AI capabilities research.

^{^}

I expect the future-scary-paradigm AIs to have a pretty obvious (and IMO legitimate) claim to phenomenal consciousness and moral patienthood, much more than LLMs do, thanks to the future scary AIs operating on human-brain-like algorithmic principles. Of course, I don’t know whether future developers will notice or care, and if they do, I don’t know how they’ll act on it. But still, I think the general dismissal of LLM welfare today (pace Anthropic hiring one guy to think about it) is not necessarily indicative of what will happen with the next paradigm.

^{^}

For the record, a poll of my X followers says that LLMs are a bigger boon to programming than IDEs, although a substantial minority disagreed. Note the obvious caveats that future LLMs will be better than today’s LLMs and that some of my X followers may not be skilled users of LLMs (or IDEs, for that matter).

^{^}

E.g. Michael Nielsen’s ASI existential risk: reconsidering alignment as a goal emphasizes that multipolar AI scenarios may lead to doom via unsolvable coordination problems related to destructive technologies, related to Vulnerable World Hypothesis. That seems bad! But the DSA thing seems bad too! Again, I’m not taking a stand here, just trying to understand the situation.

86

Foom & Doom 1: “Brain in a box in a basement”

86

1.4.1 Possible counter: “If a different, much more powerful, AI paradigm existed, then someone would have already found it.”

For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence

1.7.2 “Plenty of room at the top”

1.7.3 What’s the rate-limiter?

1.1 Series summary and Table of Contents

1.1.2 Should I stop reading if I expect LLMs to scale to ASI?

1.2 Post summary and Table of Contents

1.3 A far-more-powerful, yet-to-be-discovered, “simple(ish) core of intelligence

1.3.1 Existence proof: the human cortex

1.3.2 Three increasingly-radical perspectives on what AI capability acquisition will look like

1.4 Counter-arguments to there being a far-more-powerful future AI paradigm, and my responses

1.4.1 Possible counter: “If a different, much more powerful, AI paradigm existed, then someone would have already found it.”

1.4.2 Possible counter: “But LLMs will have already reached ASI before any other paradigm can even put its shoes on”

1.4.3 Possible counter: “If ASI will be part of a different paradigm, who cares? It’s just gonna be a different flavor of ML.”

1.4.4 Possible counter: “If ASI will be part of a different paradigm, the new paradigm will be discovered by LLM agents, not humans, so this is just part of the continuous ‘AIs-doing-AI-R&D’ story like I’ve been saying”

1.5 Training compute requirements: Frighteningly little

1.6 Downstream consequences of “new paradigm with frighteningly little training compute”

1.6.1 I’m broadly pessimistic about existing efforts to delay AGI

1.6.2 I’m broadly pessimistic about existing efforts towards regulating AGI

1.6.3 I expect that, almost as soon as we have AGI at all, we will have AGI that could survive indefinitely without humans

1.7 Very little R&D separating “seemingly irrelevant” from ASI

1.7.1 For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence

1.7.2 “Plenty of room at the top”

1.7.3 What’s the rate-limiter?

1.8 Downstream consequences of “very little R&D separating ‘seemingly irrelevant’ from ‘ASI’”

1.8.1 Very sharp takeoff in wall-clock time

1.8.1.1 But what about training time?

1.8.1.2 But what if we try to make takeoff smoother?

1.8.2 Sharp takeoff even without recursive self-improvement

1.8.2.1 …But recursive self-improvement could also happen

1.8.3 Next-paradigm AI probably won’t be “deployed” at all, and ASI will probably show up in a world not wildly different from today’s

1.8.4 We better sort out technical alignment, sandbox test protocols, etc., before the new paradigm seems even relevant at all, let alone scary

1.8.5 AI-assisted alignment research seems pretty doomed

1.8.6 The rest of “AI for AI safety” seems pretty doomed too

1.8.7 Decisive Strategic Advantage (DSA) seems hard to avoid

1.9 Timelines

1.9.1 Downstream consequences of timelines

1.10 Conclusion