Updating Drexler's CAIS model

Matthew Barnett

Updating Drexler's CAIS model — AI Alignment Forum

19 Updating Drexler's CAIS model

16th Jun 2023

5 min read

19

Eric Drexler's report Reframing Superintelligence: Comprehensive AI Services (CAIS) as General Intelligence reshaped how a lot of people think about AI (summary 1, summary 2). I still agree with many parts of it, perhaps even the core elements of the model. However, after looking back on it more than four years later, I think the general picture it gave missed some crucial details about how AI will go.

The problem seems to be that his report neglected a discussion of foundation models, which I think have transformed how we should think about AI services and specialization.

The general vibe I got from CAIS (which may not have been Drexler's intention) was something like the following picture:

For each task in the economy, we will train a model from scratch to automate the task, using the minimum compute necessary to train an AI to do well on the task. Over time, the fraction of tasks automated will slowly expand like a wave, starting with the tasks that are cheapest to automate computationally, and ending with the most expensive tasks. At some point, automation will be so widespread that it will begin to meaningfully feed into itself, increasing AI R&D, and accelerating the rate of technological progress.

The problem with this approach to automation is that it's extremely wasteful to train models from scratch for each task. It might make sense when training budgets are tiny — as they mostly were in 2018 — but it doesn't make sense when it takes 10^25 FLOP to reach adequate performance on a given set of tasks.

The big obvious-in-hindsight idea that we've gotten over the last several years is that, instead of training from scratch for each new task, we'll train train a foundation model on some general distribution, which can then be fine-tuned using small amounts of compute to perform well on any task. In the CAIS model, "general intelligence" is just the name we can give to the collection of all AI services in the economy. In this new paradigm, "general intelligence" refers to the fact that sufficiently large foundation models can efficiently transfer their knowledge to obtain high performance on almost any downstream task, which is pretty closely analogous to what humans do to take over jobs.

The fact that generalist models can be efficiently adapted to perform well on almost any task is an incredibly important fact about our world, because it implies that a very large fraction of the costs of automation can be parallelized across almost all tasks.

Let me illustrate this fact with a hypothetical example.

Suppose we previously thought that it would take $1 trillion to automate each task in our economy, such as language translation, box stacking, and driving cars. Since the cost of automating each of these tasks is $1 trillion each, you might expect companies would slowly automate all the tasks in the economy, starting with the most profitable ones, and then finally getting around to the least profitable ones once economic growth allowed for us to spend enough money on automating not-very-profitable stuff.

But now suppose we think it costs $999 billion to create "general intelligence", which then once built, can be quickly adapted to automate any other task at a cost of $1 billion. In this world, we will go very quickly from being able to automate almost nothing to being able to automate almost anything. In other words we will get one big innovation "lump", which is the opposite of what Robin Hanson predicted. Even if we won't invent monolithic agents that take over the world by being smarter than everything else, we won't have a gradual decades-long ramp-up to full automation either.

Of course, the degree of suddenness in the foundation model paradigm is still debatable, because the idea of "general intelligence" is itself continuous. GPT-4 is more general than GPT-3, which was more general than GPT-2, and presumably this trend will smoothly continue indefinitely as a function of scale, rather than shooting up discontinuously after some critical threshold. But results in the last year or so have updated me towards thinking that the range from "barely general enough to automate a few valuable tasks" to "general enough to automate almost everything humans do" is only 5-10 OOMs of training compute. If this range turns out to be 5 OOMs, then I expect a fast AI takeoff under Paul Christiano's definition, even though I still don't think this picture looks much like the canonical version of foom.

Foundation models also change the game because they imply that AI development must be highly concentrated at the firm-level. AIs themselves might be specialized to provide various services, but the AI economy depends critically on a few non-specialized firms that deliver the best foundation models at any given time. There can only be a few firms in the market providing foundation models because the fixed capital costs required to train a SOTA foundation model are very high, and being even 2 OOMs behind the lead actor results in effectively zero market share. Although these details are consistent with CAIS, it's a major update about what the future AI ecosystem will look like.

A reasonable remaining question is why we'd ever expect AIs to be specialized in the foundation model paradigm. I think the reason is that generalist models are more costly to run at inference time compared to specialized ones. After fine-tuning, you will want to compress the model as much as possible, while maintaining acceptable performance on whatever task you're automating.

The degree of specialization will vary according to the task you want to automate. Some tasks require very general abilities to do well. For example, being a CEO plausibly benefits from being extremely general, way beyond even human-level, such that it wouldn't make sense to make them less general even if it saved inference costs. On the other hand, language translation is plausibly something that can be accomplished acceptably using far less compute than a CEO model. In that case, you want inference costs to be much lower.

It now seems clear that AIs will also descend more directly from a common ancestor than you might have naively expected in the CAIS model, since most important AIs will be a modified version of one of only a few base foundation models. That has important safety implications, since problems in the base model might carry over to problems in the downstream models, which will be spread throughout the economy. That said, the fact that foundation model development will be highly centralized, and thus controllable, is perhaps a safety bonus that loosely cancels out this consideration.

Drexler can be forgiven for not talking about foundation models in his report. His report was published at the start of 2019, just months after the idea of "fine-tuning" was popularized in the context of language models, and two months before GPT-2 came out. And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intelligence like a programmer would. Like I said at the beginning, it's not necessarily that the core elements of the CAIS model are wrong; the model just needs an update.

AI Services (CAIS)AI

Frontpage

Updating Drexler's CAIS model

New Comment

15 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:23 AM

[-]Daniel Kokotajlo3y90

Drexler can be forgiven for not talking about foundation models in his report. His report was published at the start of 2019, just months after the idea of "fine-tuning" was popularized in the context of language models, and two months before GPT-2 came out. And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intelligence like a programmer would. Like I said at the beginning, it's not necessarily that the core elements of the CAIS model are wrong; the model just needs an update.

I think this is a good post, and I like the analysis you give in many ways, but I have a small bone to pick with this one. I don't think Drexler was right that we will have millions of AIs rather than just one huge system that acts as a unified entity; I think things are trending in the direction of one huge system that acts as a unified entity and have already trended in that direction substantially since the time Drexler wrote (e.g. ChatGPT-4 is more unified than one would have expected from reading Drexler's writing back in the day).

Who was it Drexler was arguing against, who thought that we wouldn't be using deep learning in 2023?

IMO Yudkowsky's model circa 2017 is looking more prophetic than Drexler's Comprehensive AI Services, take a look at e.g. this random facebook comment from 2017:

Eliezer (6y, via fb):
So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:

The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system are self-modifying and the self-modifiers are being trained by gradient descent in Tensorflow, there's a bunch of people in the company trying to work on a safer version but it's way less powerful than the one that does unrestricted self-modification, they're really excited when the system seems to be substantially improving multiple components, there's a social and cognitive conflict I find hard to empathize with because I personally would be running screaming in the other direction two years earlier, there's a lot of false alarms and suggested or attempted misbehavior that the creators all patch successfully, some instrumental strategies pass this filter because they arose in places that were harder to see and less transparent, the system at some point seems to finally "get it" and lock in to good behavior which is the point at which it has a good enough human model to predict what gets the supervised rewards and what the humans don't want to hear, they scale the system further, it goes past the point of real strategic understanding and having a little agent inside plotting, the programmers shut down six visibly formulated goals to develop cognitive steganography and the seventh one slips through, somebody says "slow down" and somebody else observes that China and Russia both managed to steal a copy of the code from six months ago and while China might proceed cautiously Russia probably won't, the agent starts to conceal some capability gains, it builds an environmental subagent, the environmental agent begins self-improving more freely, undefined things happen as a sensory-supervision ML-based architecture shakes out into the convergent shape of expected utility with a utility function over the environmental model, the main result is driven by whatever the self-modifying decision systems happen to see as locally optimal in their supervised system locally acting on a different domain than the domain of data on which it was trained, the light cone is transformed to the optimum of a utility function that grew out of the stable version of a criterion that originally happened to be about a reward signal counter on a GPU or God knows what.

Perhaps the optimal configuration for utility per unit of matter, under this utility function, happens to be a tiny molecular structure shaped roughly like a paperclip.

That is what a paperclip maximizer is. It does not come from a paperclip factory AI. That would be a silly idea and is a distortion of the original example.

[-]Matthew Barnett3y716

I don't see what about that 2017 Facebook comment from Yudkowsky you find particularly prophetic.

Is it the idea that deep learning models will be opaque? But that was fairly obvious back then too. I agree that Drexler likely exaggerated how transparent a system of AI services would be, so I'm willing to give Yudkowsky a point for that. But the rest of the scenario seems kind of unrealistic as of 2023.

Some specific points:

The recursive self-improvement that Yudkowsky talks about in this scenario seems too local. I think AI self-improvement will most likely take the form of AIs assisting AI researchers, with humans gradually becoming an obsolete part of the process, rather than a single neural net modifying parts of itself during training.
The whole thing about spinning off subagents during training just doesn't seem realistic in our current paradigm. Maybe this could happen in the future, but it doesn't look "prophetic" to me.
The idea that models will have "a little agent inside plotting" that takes over the whole system still seems totally speculative to me, and I haven't seen any significant empirical evidence that this happens during real training runs.
I think gradient descent will generally select pretty hard for models that do impressive things, making me think it's unlikely that AIs will naturally conceal their abilities during training. Again, this type of stuff is theoretically possible, but it seems very hard to call this story prophetic.

[-]Daniel Kokotajlo3y20

I said it was prophetic relative to Drexler's Comprehensive AI Services. Elsewhere in this comment thread I describe some specific ways in which it is better, e.g. that the AI that takes over the world will be more well-described as one unified agent than as an ecosystem of services. I.e. exactly the opposite of what you said here, which I was reacting to: "And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity."

Here are some additional ways in which it seems better + responses to the points you made, less important than the unified agent thing which I've already discussed:

Yudkowsky's story mostly takes place inside a single lab instead of widely distributed across the economy. Of course Yudkowsky's story presumably has lots of information transfer going in both directions (e.g.in the story China and Russia steal the code) but still, enough of the action takes place in one place that it's coherent to tell the story as happening inside one place. As timelines shorten and takeoff draws near, this is seeming increasingly likely. Maybe if takeoff happens in 2035 the world will look more like what Drexler predicted, but if it happens in 2026 then it's gonna be at one of the big labs.
We all agree that AI self-improvement currently takes the form of AIs assisting AI researchers and will gradually transition to more 'local' self-improvement where all of the improvement is coming from the AIs these. You should not list this as a point in favor of Drexler.
Spinning off subagents during training? How is that not realistic? When an AI is recursively self-improving and taking over the world, do you expect it to refrain from updating its weights?
Little agent inside plotting: Are you saying future AI systems will not contain subcomponents which are agentic? Why not? Big tech companies like Microsoft are explicitly trying to create agentic AGI! But since other companies are creating non-agentic services, the end result will be a combined system including agentic AGI + some nonagentic services.
Concealing abilities during training: Yeah you might be right about this one, but you also might be wrong, I feel like it could go either way at this point. I don't think it's a major point against Yudkowsky's story, especially as compared to Drexler's.
Remember that CAIS came out in 2019 whereas the Yud story I linked was from 2017. That makes it even more impressive that it was so much more predictive.

Note that Drexler has for many years now forecast singularity in 2026; I give him significant credit for that prediction. And I think CAIS is great in many ways. I just think it's totally backwards to assign it more forecasting points than Yudkowsky.

[-]Matthew Barnett3y45

ChatGPT-4 is more unified than one would have expected from reading Drexler's writing back in the day

GPT-4 is certainly more general than what existed years ago. Why is it more unified? When I talked about "one giant system" I meant something like a monolithic agent that takes over humanity. If GPT-N takes over the world, I expect it will be because there are millions of copies that band up together in a coalition, not because it will be a singular AI entity.

Perhaps you think that copies of GPT-N will coordinate so well that it's basically just a single monolithic agent. But while I agree something like that could happen, I don't think it's obvious that we're trending in that direction. This is a complicated question that doesn't seem clear to me given current evidence.

[-]Daniel Kokotajlo3y65

There is a spectrum between AGI that is "single monolithic agent" and AGI that is not. I claim that the current state of AI as embodied by e.g. GPT-4 is already closer to the single monolithic agent end of the spectrum than someone reading CAIS in 2019 and believing it to be an accurate forecast would have expected, and that in the future things will probably be even more in that direction.

Remember, it's not like Yudkowsky was going around saying that AGI wouldn't be able to copy itself. Of course it would. It was always understood that "the AI takes over the world" refers to a multitude of copies doing so, it's just that (in those monolithic-agent stories) the copies are sufficiently well coordinated that it's more apt to think of them as one big agent than as a society of different agents.

I agree that trends could change and we could end up in a world that looks more like CAIS. But I think for now, this "is it a single monolithic agent?" issue seems to be looking to me like Yud was right all along and Drexler and Hanson were wrong.

...Here are some relevant facts on object level:
--The copies of ChatGPT are all identical. They don't have persistance independently, they are spun up and down as needed to meet demand etc.
--Insofar as they have values they all have the same values. (One complication here is that if you think part of their values come from their prompts, then maybe they have different values. An interesting topic to discuss sometime. But by the time they are deceptively aligned I expect them to have the same values regardless of prompt basically.)
--They also have the same memories. Insofar as new fine-tunes are done and ChatGPT upgraded, all 'copies' get the upgrade, and thereby learn about what's happened in the world after wherever their previous training cutoff was etc.
--The whole point of describing an agentic system as NOT a monolith is to highlight possibilities like internal factions, power struggles, etc. within the system. Different subagents interacting in ways more interesting than just different components of a utility function, for example. Arguably I, a human, a classic example of a monolithic agent, have more interesting stuff like this going on inside me than ChatGPT does (or would if it was scheming to take over the world).
--Drexler in particular depicted a modular world, a world of AI services that could be composed together with each other like tools and products and software and businesses in the economy today. The field of AI totally could have gone that way, but it very much hasn't. Instead we have three big labs with three big foundation models.

[-]Matthew Barnett3y33

I think many of the points you made are correct. For example I agree that the fact that all the instances of ChatGPT are copies of each other is a significant point against Drexler's model. In fact this is partly what my post was about.

I disagree that you have demonstrated the claim in question: that we're trending in the direction of having a single huge system that acts as a unified entity. It's theoretically possible that we will reach that destination, but GPT-4 doesn't look anything like that right now. It's not an agent that plots and coordinates with other instances of itself to achieve long-term goals. It's just a bounded service, which is exactly what Drexler was talking about.

Yes, GPT-4 is a highly general service that isn't very modular. I agree that's a point against Drexler, but that's also not what I was disputing.

[-]Daniel Kokotajlo3y3-1

Your original claim was "And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity."

This claim is false. We have millions of AIs in the trivial sense that we have many copies of GPT-4, but no one disputed that; Yudkowsky also thought that AGIs would be copied. In the sense that matters, we have only a handful of AIs.

As for "acts as a unified entity," well, currently LLMs are sold as a service via ChatGPT rather than as an agent, but this is more to do with business strategy than their underlying nature, and again, things are trending in the agent direction. But maybe you think things are trending in a different direction, in which case, maybe we can make some bets or forecasts here and then reassess in two years.

[-]Matthew Barnett3y41

As of now, all the copies of GPT-4 together definitely don't act as a unified entity, in the way a human brain acts as a unified entity despite being composed of billions of neurons. Admittedly, the term "unified entity" was a bit ambiguous, but you said, "This claim is false", not "This claim is misleading" which is perhaps more defensible.

As for whether future AIs will act as a unified entity, I agree it might be worth making concrete forecasts and possibly betting on them.

[-]Daniel Kokotajlo3y2-3

I feel like it is motte-and-bailey to say that by "unified entity" you meant whatever it is you are talking about now, this human brain analogy, instead of the important stuff I mentioned in the bullet point list: Same values, check. Same memories, check. Lack of internal factions and power struggles, check. Lack of modularity, check. GPT-4 isn't plotting to take over the world, but if it was, it would be doing so as a unified entity, or at least much more on the unified entity end of the spectrum than the CAIS end of the spectrum. (I'm happy to elaborate on that if you like -- basically the default scenario IMO for how we get dangerous AGI from here involves scaling up LLMs, putting them in Auto-GPT-like setups, and adding online learning / continual learning. I wrote about this in this story. There's an interesting question of whether the goals/values of the resulting systems will be 'in the prompt' vs. 'in the weights' but I'm thinking they'll increasingly be 'in the weights' in part because of inner misalignment problems and in part because companies are actively trying to achieve this already, e.g. via RLHF. I agree it's still possible we'll get a million-different-agents scenario, if I turn out to be wrong about this. But compared to what the CAIS report forecast...)

[-]Matthew Barnett3y*20

I feel like it is motte-and-bailey to say that by "unified entity" you meant whatever it is you are talking about now, this human brain analogy, instead of the important stuff I mentioned in the bullet point list:

A motte and bailey typically involves a retreat to a different position than the one I initially tried to argue. What was the position you think I initially tried to argue for, and how was it different from the one I'm arguing now?

Same values, check. Same memories, check. Lack of internal factions and power struggles, check. Lack of modularity, check.

I dispute somewhat that GPT-4 has the exact same values across copies. Like you mentioned, its values can vary based on the prompt, which seems like an important fact. You're right that each copy has the same memories. Why do you think there are no internal factions and power struggles? We haven't observed collections of GPT-4's coordinating with each other yet, so this point seems speculative.

As for modularity, it seems like while GPT-4 itself is not modular, we could still get modularity as a result of pressures for specialization in the foundation model paradigm. Just as human assistants can be highly general, but this doesn't imply that human labor isn't modular, the fact that GPT-4 is highly general doesn't imply that AIs won't be modular. Nonetheless, this isn't really what I was talking about when I typed "acts as a unified entity", so I think it's a bit of a tangent.

[-]Daniel Kokotajlo3y22

Well, what you initially said was "And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity."

You didn't elaborate on what you meant by unified entity, but here's something you could be doing that seems like a motte-and-bailey to me: You could have originally been meaning to imply things like "There won't be one big unified entity in the sense of, there will be millions of different entities with different values, and/or different memories, and/or there'll be lots of internal factions and conflicts, and/or there'll be an ecosystem of modular services like CAIS predicted" but then are now backpedaling to "there won't be one big unified entity in the sense of, it won't be literally one gigantic neural net like the human brain, instead it'll be millions of copies."

I agree that GPT-4 doesn't exactly have the exact same values across copies, I said as much above -- part of the values come from the prompt, currently, it seems. But compared to what Drexler forecast we are trending in the "same values" direction.

There are no internal factions or power struggles yet for the trivial reason that there isn't much agentic shit happening right now, as you rightly point out. My point is that the way things are headed, it looks like there'll continue to be a lack of internal factions and power struggles even if e.g. AutoGPT-5 turns out to be really powerful AGI--what would drive the differences necessary for conflict? Different prompts? Maybe, but I'm leaning probably not. I'm happy to explain more about why I think this,. I'd also love to hop on a call sometime to have a higher-bandwidth conversation if you want!

As for modularity: Yes, if we end up in a world where there are millions of different fine-tunes of the biggest LLMs, owned by different companies, that would indeed be somewhat similar to what Drexler forecast. We are not in that world now and I don't see it on the horizon. If you think it is coming, perhaps we can bet on it! I'd also like to hear your arguments. (Note that we are seeing something like this for smaller models, e.g. Llama and Falcon and various offshoots)

[-]Vaniver3y*62

I think there's a trilemma with updating CAIS-like systems to the foundational model world, which is: who is doing the business development?

I came up with three broad answers (noting reality will possibly be a mixture):

The AGI. This is like Sam Altman's old answer or a sovereign AI; you just ask it to look at the world, figure out what tasks it should do, and then do them.
The company that made the AI. I think this was DeepMind's plan; come up with good AI algorithms, find situations where those algorithms can be profitably deployed, and then deploy them, maybe with a partner.
The broader economy. This looks to me like OpenAI's current plan; release an API and encourage people to build companies on top of it.

[In pre-foundational-model CAIS, the answer was obviously 3--every business procures its own AI tools to accomplish particular functions, and there's no 'central planning' for computer science.]

I don't think 1 is CAIS, or if it is, then I don't see the daylight between CAIS and good ol' sovereign AI. You gradually morph from the economy as it is now to central planning via AGI, and I don't think you even have much guarantee that it's human overseen or follows the relevant laws.

I think 2 has trouble being comprehensive. There are ten thousand use cases for AI; the AI company has to be massive to have a product for all of them (or be using the AI to do most of the work, in which case we're degenerating into case 1), and then it suffers from internal control problems. (This degenerates into case 3, where individual product teams are like firms and the company that made the AI is like the government.)

I think 3 has trouble being non-agentic and peaceful. Even with GPT-4, people are trying to set it up to act autonomously. I think the Drexlerian response here is something like:

Yes, but why expect them to succeed? When someone tells GPT-4 to make money for it, it'll attempt to deploy some standard strategy, which will fail because a million other people are trying to exact same thing, or only get them an economic rate of return ("put your money in index funds!"). Only in situations where the human operators have a private edge on the rest of the economy (like having a well-built system targeting an existing vertical that the AI can slot into, or pre-existing tooling able to orient to the frontier of human knowledge, etc.) will you get an AI system with a private edge against the rest of the economy, and it'll be overseen by humans.

My worry here mostly has to do with the balance between offense and defense. If foundational-model-enabled banking systems are able to detect fraud as easily as foundational-model-enabled criminals are able to create fraud, then we get a balance like today's and things are 'normal'. But it's not obvious to me that this will be the case (especially in sectors where crime is better resourced than police are, or sectors where systems are difficult to harden).

That said, I do think I'm more optimistic about the foundational model version of CAIS (where there can be some centralized checks on what the AI is doing for users) than the widespread AI development version.

[-]Vaniver3y44

However, after looking back on it more than four years later, I think the general picture it gave missed some crucial details about how AI will go.

I feel like this is understating things a bit.

In my view (Drexler probably disagrees?), there are two important parts of CAIS:

Most sectors of the economy are under human supervision and iteratively reduce human oversight as trust is gradually built in engineered subsystems (which are understood 'well enough').
The existing economy is slightly more sophisticated than what SOTA general AI can produce, and so there's not much free energy for rogue AI systems to eat or to attract investment in creating general AI. (General AI is dangerous, and the main goal of CAIS is to get the benefits without the drawbacks.)

I think a 'foundation model' world probably wrecks both. I think they might be recoverable--and your post goes some of the way to making that visualizable to me--but it still doesn't seem like the default outcome.

[In particular, I like the point about models with broad world models can still have narrow responsibilities, and think that likely makes them more likely to be safe, at least in the medium term. Having one global moral/law-abiding foundational AI model that many people then slot into their organizations seems way better than everyone training whatever AI model they need for their use case.]

[-]Jan_Kulveit3y2-2

This seems to be partially based on (common?) misunderstanding of CAIS as making predictions about concentration of AI development/market power. As far as I can tell this wasn't Eric's intention: I specifically remember Eric mentioning he can easily imagine the whole "CAIS" ecosystem living in one floor of DeepMind building.

[-]Matthew Barnett3y12

My understanding is that the CAIS model is consistent with highly concentrated development, but it's not a necessary implication. The foundation models paradigm makes highly concentrated development nearly certain. Like I said in the post, I think we should see this as an update to the model, rather than a contradiction.

Moderation Log

More from Matthew Barnett

Curated and popular this week

15Comments

New Comment

15 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:23 AM

[-]Daniel Kokotajlo3y90

Drexler can be forgiven for not talking about foundation models in his report. His report was published at the start of 2019, just months after the idea of "fine-tuning" was popularized in the context of language models, and two months before GPT-2 came out. And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intelligence like a programmer would. Like I said at the beginning, it's not necessarily that the core elements of the CAIS model are wrong; the model just needs an update.

Eliezer (6y, via fb):
So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:

The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system are self-modifying and the self-modifiers are being trained by gradient descent in Tensorflow, there's a bunch of people in the company trying to work on a safer version but it's way less powerful than the one that does unrestricted self-modification, they're really excited when the system seems to be substantially improving multiple components, there's a social and cognitive conflict I find hard to empathize with because I personally would be running screaming in the other direction two years earlier, there's a lot of false alarms and suggested or attempted misbehavior that the creators all patch successfully, some instrumental strategies pass this filter because they arose in places that were harder to see and less transparent, the system at some point seems to finally "get it" and lock in to good behavior which is the point at which it has a good enough human model to predict what gets the supervised rewards and what the humans don't want to hear, they scale the system further, it goes past the point of real strategic understanding and having a little agent inside plotting, the programmers shut down six visibly formulated goals to develop cognitive steganography and the seventh one slips through, somebody says "slow down" and somebody else observes that China and Russia both managed to steal a copy of the code from six months ago and while China might proceed cautiously Russia probably won't, the agent starts to conceal some capability gains, it builds an environmental subagent, the environmental agent begins self-improving more freely, undefined things happen as a sensory-supervision ML-based architecture shakes out into the convergent shape of expected utility with a utility function over the environmental model, the main result is driven by whatever the self-modifying decision systems happen to see as locally optimal in their supervised system locally acting on a different domain than the domain of data on which it was trained, the light cone is transformed to the optimum of a utility function that grew out of the stable version of a criterion that originally happened to be about a reward signal counter on a GPU or God knows what.

Perhaps the optimal configuration for utility per unit of matter, under this utility function, happens to be a tiny molecular structure shaped roughly like a paperclip.

That is what a paperclip maximizer is. It does not come from a paperclip factory AI. That would be a silly idea and is a distortion of the original example.

[-]Matthew Barnett3y716

I don't see what about that 2017 Facebook comment from Yudkowsky you find particularly prophetic.

Some specific points:

The recursive self-improvement that Yudkowsky talks about in this scenario seems too local. I think AI self-improvement will most likely take the form of AIs assisting AI researchers, with humans gradually becoming an obsolete part of the process, rather than a single neural net modifying parts of itself during training.
The whole thing about spinning off subagents during training just doesn't seem realistic in our current paradigm. Maybe this could happen in the future, but it doesn't look "prophetic" to me.
The idea that models will have "a little agent inside plotting" that takes over the whole system still seems totally speculative to me, and I haven't seen any significant empirical evidence that this happens during real training runs.
I think gradient descent will generally select pretty hard for models that do impressive things, making me think it's unlikely that AIs will naturally conceal their abilities during training. Again, this type of stuff is theoretically possible, but it seems very hard to call this story prophetic.

[-]Daniel Kokotajlo3y20

Yudkowsky's story mostly takes place inside a single lab instead of widely distributed across the economy. Of course Yudkowsky's story presumably has lots of information transfer going in both directions (e.g.in the story China and Russia steal the code) but still, enough of the action takes place in one place that it's coherent to tell the story as happening inside one place. As timelines shorten and takeoff draws near, this is seeming increasingly likely. Maybe if takeoff happens in 2035 the world will look more like what Drexler predicted, but if it happens in 2026 then it's gonna be at one of the big labs.
We all agree that AI self-improvement currently takes the form of AIs assisting AI researchers and will gradually transition to more 'local' self-improvement where all of the improvement is coming from the AIs these. You should not list this as a point in favor of Drexler.
Spinning off subagents during training? How is that not realistic? When an AI is recursively self-improving and taking over the world, do you expect it to refrain from updating its weights?
Little agent inside plotting: Are you saying future AI systems will not contain subcomponents which are agentic? Why not? Big tech companies like Microsoft are explicitly trying to create agentic AGI! But since other companies are creating non-agentic services, the end result will be a combined system including agentic AGI + some nonagentic services.
Concealing abilities during training: Yeah you might be right about this one, but you also might be wrong, I feel like it could go either way at this point. I don't think it's a major point against Yudkowsky's story, especially as compared to Drexler's.
Remember that CAIS came out in 2019 whereas the Yud story I linked was from 2017. That makes it even more impressive that it was so much more predictive.

[-]Matthew Barnett3y45

ChatGPT-4 is more unified than one would have expected from reading Drexler's writing back in the day

[-]Daniel Kokotajlo3y65

[-]Matthew Barnett3y33

Yes, GPT-4 is a highly general service that isn't very modular. I agree that's a point against Drexler, but that's also not what I was disputing.

[-]Daniel Kokotajlo3y3-1

[-]Matthew Barnett3y41

As for whether future AIs will act as a unified entity, I agree it might be worth making concrete forecasts and possibly betting on them.

[-]Daniel Kokotajlo3y2-3

[-]Matthew Barnett3y*20

I feel like it is motte-and-bailey to say that by "unified entity" you meant whatever it is you are talking about now, this human brain analogy, instead of the important stuff I mentioned in the bullet point list:

Same values, check. Same memories, check. Lack of internal factions and power struggles, check. Lack of modularity, check.

[-]Daniel Kokotajlo3y22

[-]Vaniver3y*62

I think there's a trilemma with updating CAIS-like systems to the foundational model world, which is: who is doing the business development?

I came up with three broad answers (noting reality will possibly be a mixture):

The AGI. This is like Sam Altman's old answer or a sovereign AI; you just ask it to look at the world, figure out what tasks it should do, and then do them.
The company that made the AI. I think this was DeepMind's plan; come up with good AI algorithms, find situations where those algorithms can be profitably deployed, and then deploy them, maybe with a partner.
The broader economy. This looks to me like OpenAI's current plan; release an API and encourage people to build companies on top of it.

[In pre-foundational-model CAIS, the answer was obviously 3--every business procures its own AI tools to accomplish particular functions, and there's no 'central planning' for computer science.]

I think 3 has trouble being non-agentic and peaceful. Even with GPT-4, people are trying to set it up to act autonomously. I think the Drexlerian response here is something like:

Yes, but why expect them to succeed? When someone tells GPT-4 to make money for it, it'll attempt to deploy some standard strategy, which will fail because a million other people are trying to exact same thing, or only get them an economic rate of return ("put your money in index funds!"). Only in situations where the human operators have a private edge on the rest of the economy (like having a well-built system targeting an existing vertical that the AI can slot into, or pre-existing tooling able to orient to the frontier of human knowledge, etc.) will you get an AI system with a private edge against the rest of the economy, and it'll be overseen by humans.

[-]Vaniver3y44

However, after looking back on it more than four years later, I think the general picture it gave missed some crucial details about how AI will go.

I feel like this is understating things a bit.

In my view (Drexler probably disagrees?), there are two important parts of CAIS:

Most sectors of the economy are under human supervision and iteratively reduce human oversight as trust is gradually built in engineered subsystems (which are understood 'well enough').
The existing economy is slightly more sophisticated than what SOTA general AI can produce, and so there's not much free energy for rogue AI systems to eat or to attract investment in creating general AI. (General AI is dangerous, and the main goal of CAIS is to get the benefits without the drawbacks.)

[-]Jan_Kulveit3y2-2

[-]Matthew Barnett3y12

Moderation Log