We might be dropping the ball on Autonomous Replication and Adaptation.

Épiphanie Gédéon

New Answer

New Comment

1 Answers sorted by
top scoring

May 31, 2024

3634

I think the opposite: ARA is just not a very compelling threat model in my mind. The key issue is that AIs that do ARA will need to be operating at the fringes of human society, constantly fighting off the mitigations that humans are using to try to detect them and shut them down. While doing all that, in order to stay relevant, they'll need to recursively self-improve at the same rate at which leading AI labs are making progress, but with far fewer computational resources. Meanwhile, if they grow large enough to be spending serious amounts of money, they'll need to somehow fool standard law enforcement and general societal scrutiny.

Superintelligences could do all of this, and ARA of superintelligences would be pretty terrible. But for models in the broad human or slightly-superhuman ballpark, ARA seems overrated, compared with threat models that involve subverting key human institutions. Remember, while the ARA models are trying to survive, there will be millions of other (potentially misaligned) models being deployed deliberately by humans, including on very sensitive tasks (like recursive self-improvement). These seem much more concerning.

Why then are people trying to do ARA evaluations? Well, ARA was originally introduced primarily as a benchmark rather than a threat model. I.e. it's something that roughly correlates with other threat models, but is easier and more concrete to measure. But, predictably, this distinction has been lost in translation. I've discussed this with Paul and he told me he regrets the extent to which people are treating ARA as a threat model in its own right.

Separately, I think the "natural selection favors AIs over humans" argument is a fairly weak one; you can find some comments I've made about this by searching my twitter.

[-]ryan_greenblatt1y*138

Hmm, I agree that ARA is not that compelling on its own (as a threat model). However, it seems to me like ruling out ARA is a relatively naturally way to mostly rule out relatively direct danger. And, once you do have ARA ability, you just need some moderately potent self-improvement ability (including training successor models) for the situation to look reasonably scary. Further, it seems somewhat hard to do capabilities evaluations that rule out this self-improvement if models are ARA capable given that there are so possible routes.

Edit: TBC, I think it's scary mostly because we might want to slow down and rogue AIs might then out pace AI labs.

So, I think I basically agree with where you're at overall, but I'd go further than "it's something that roughly correlates with other threat models, but is easier and more concrete to measure" and say "it's a reasonable threshold to use to (somewhat) bound danger " which seems worth noting.

While doing all that, in order to stay relevant, they'll need to recursively self-improve at the same rate at which leading AI labs are making progress, but with far fewer computational resources

I agree this is probably an issue for the rogue AIs. Bu... (read more)

5Richard_Ngo1y

This is what I meant by "ARA as a benchmark"; maybe I should have described it as a proxy instead. Though while I agree that ARA rules out most danger, I think that's because it's just quite a low bar. The sort of tasks involved in buying compute etc are ones most humans could do. Meanwhile more plausible threat models involve expert-level or superhuman hacking. So I expect a significant gap between ARA and those threat models. You'd need either really good ARA or really good self-improvement ability for an ARA agent to keep up with labs given the huge compute penalty they'll face, unless there's a big slowdown. And if we can coordinate on such a big slowdown, I expect we can also coordinate on massively throttling potential ARA agents.

2ryan_greenblatt1y

Yep, I was imagining big slow down or uncertainty in how explosive AI R&D will be. It agree that this cuts the risk, but I don't think it fully eliminates it. I think it might be considerably harder to fully cut off ARA agents than to do a big slowdown. One way to put this is that quite powerful AIs in the wild might make slowing down 20% less likely to work (conditional on a serious slow down effort) which seems like a decent cost to me. Part of my view here is that ARA agents will have unique affordances that no human organization will have had before (like having truely vast, vast amounts of pretty high skill labor). Of course, AI labs will also have access to vast amounts of high skill labor (from possibly misaligned AIs). I don't know how the offense defense balance goes here. I think full defense might require doing really crazy things that organizations are unwilling to do. (E.g. unleashing a huge number of controlled AI agents which may commit crimes in order to take free energy.) My guess is that you need to be a decent but not amazing software engineer to ARA. So, I wouldn't describe this as "tasks most humans can do".

2Richard_Ngo1y

The more labor they have, the more detectable they are, and the easier they are to shut down. Also, are you picturing them gaining money from crimes, then buying compute legitimately? I think the "crimes" part is hard to stop but the "paying for compute" part is relatively easy to stop. Yeah, you're probably right. I still stand by the overall point though.

7Hjalmar_Wijk1y

I think in our current situation shutting down all rogue AI operations might be quite tough (though limiting their scale is certainly possible, and I agree with the critique that absent slowdowns or alignment problems or regulatory obstacles etc. it would be surprising if these rogue agents could compete with legitimate actors with many more GPUs). Assuming the AI agents have money there are maybe three remaining constraints the AI agents need to deal with: * purchasing/renting the GPUs, * (if not rented through a cloud provider) setting them up and maintaining them, and * evading law enforcement or other groups trying to locate them and shut them down For acquiring the GPUs, there are currently a ton of untracked GPUs spread out around the world, both gaming GPUs like 4090s and datacenter GPUs like H100s. I can buy or rent them with extremely minimal KYC from tons of different places. If we assume ARA agents are able to do all of the following: * Recruit people to physically buy GPUs from retail stores * Take delivery of online ordered GPUs in anonymized ways using intermediaries * Recruit people to set up fake AI startups and buy GPUs through them then the effort required to prevent them from acquiring any GPUs in a world anything like today seems crazy. And even if the US + allies started confiscating or buying up 4090s en masse, it's enough that some other countries adopt laxer standards (perhaps to attract AI talent that is frustrated by the new draconian GPU-control regime). As for setting up acquired GPUs, I think the AI agents will probably be able to find colocation operators that don't ask too many questions in many parts of the world, but even if the world was to coordinate on extremely stringent controls, the AI agents could set up datacenters of their own in random office buildings or similar - each inference setup wouldn't need that much power and I think it would be very hard to track it down. As for not being shut down by law enforcement

2ryan_greenblatt1y

I agree that this is a fairly strong argument that this AI agent wouldn't be able be able cause problems while rogue. However, I think there is also a concern that this AI will be able to cause serious problems via the affordances it is granted through the AI lab. In particular, AIs might be given huge affordances internally with minimal controls by default. And this might pose a substantial risk even if AIs aren't quite capable enough to pull off the cybersecurity operation. (Though it seems not that bad of a risk.)

3Hjalmar_Wijk1y

Yeah that's right, I made too broad a claim and only meant to say it was an argument against their ability to pose a threat as rogue independent agents.

2ryan_greenblatt1y

Both legitimately and illegitimately acquiring compute could plausibly be the best route. I'm uncertain. It doesn't seem that easy to lock down legitimate compute to me? I think you need to shutdown people buying/renting 8xH100 style boxes. This seems quite difficult potentially. The model weights probably don't fit in VRAM in a single 8xH100 device (the model weights might be 5 TB when 4 bit quantized), but you can maybe fit with 8-12 of these. And, you might not need amazing interconnect (e.g. normal 1 GB/s datacenter internet is fine) for somewhat high latency (but decently good throughput) pipeline parallel inference. (You only need to send the residual stream per token.) I might do a BOTEC on this later. Unless you're aware of a BOTEC on this?

7Hjalmar_Wijk1y

I did some BOTECs on this and think 1 GB/s is sort of borderline, probably works but not obviously. E.g. I assumed a ~10TB at fp8 MoE model with a sparsity factor of 4 with 32768 hidden size. With 32kB per token you could send at most 30k tokens/second over a 1GB/s interconnect. Not quite sure what a realistic utilization would be, but maybe we halve that to 15k? If the model was split across 20 8xH100 boxes, then each box might do ~250 GFLOP/token (2 * 10T parameters / (4*20)), so each box would do at most 3.75 PFLOP/second, which might be about ~20-25% utilization. This is not bad, but for a model with much more sparsity or GPUs with a different FLOP/s : VRAM ratio or spottier connection etc. the bandwidth constraint might become quite harsh. (the above is somewhat hastily reconstructed from some old sheets, might have messed something up)

[-]Donald Hobson1y32

If AI labs are slamming on the recursive self improvement ASAP, it may be that Autonomous Replicating Agents are irrelevant. But that's a "ARA can't destroy the world if AI labs do it first" argument.

ARA may well have more compute than AI labs. Especially if the AI labs are trying to stay within the law, and the ARA is stealing any money/compute that it can hack it's way into. (Which could be >90% of the internet if it's good at hacking. )

there will be millions of other (potentially misaligned) models being deployed deliberately by humans, includi

... (read more)

[-]Charbel-Raphaël1y*3-2

Thanks for this comment, but I think this might be a bit overconfident.

constantly fighting off the mitigations that humans are using to try to detect them and shut them down.

Yes, I have no doubt that if humans implement some kind of defense, this will slow down ARA a lot. But:

1) It’s not even clear people are going to try to react in the first place. As I say, most AI development is positive. If you implement regulations to fight bad ARA, you are also hindering the whole ecosystem. It’s not clear to me that we are going to do something about open

... (read more)

3Richard_Ngo1y

I think this just depends a lot on how large-scale they are. If they are using millions of dollars of compute, and are effectively large-scale criminal organizations, then there are many different avenues by which they might get detected and suppressed. A world which can pause AI development is one which can also easily throttle ARA AIs. This seems like a weak central point. "Pretty annoying" and some people dying is just incredibly small compared with the benefits of AI. And "it might be a problem in an indefinite number of years" doesn't justify the strength of the claims you're making in this post, like "we are approaching a point of no return" and "without a treaty, we are screwed". An extended analogy: suppose the US and China both think it might be possible to invent a new weapon far more destructive than nuclear weapons, and they're both worried that the other side will invent it first. Worrying about ARAs feels like worrying about North Korea's weapons program. It could be a problem in some possible worlds, but it is always going to be much smaller, it will increasingly be left behind as the others progress, and if there's enough political will to solve the main problem (US and China racing) then you can also easily solve the side problem (e.g. by China putting pressure on North Korea to stop). Link here, and there are other comments in the same thread. Was on my laptop, which has twitter blocked, so couldn't link it myself before.

3ryan_greenblatt1y

I push back on this somewhat in a discussion thread here. (As a pointer to people reading through.) Overall, I think this is likely to be true (maybe 60% likely), but not enough that we should feel totally fine about the situation.

1Charbel-Raphaël1y

I agree that's a bit too much, but it seems to me that we're not at all on the way to stopping open source development, and that we need to stop it at some point; maybe you think ARA is a bit early, but I think we need a red line before AI becomes human-level, and ARA is one of the last arbitrary red lines before everything accelerates. But I still think no return to loss of control because it might be very hard to stop ARA agent still seems pretty fair to me. I agree with your comment on twitter that evolutionary forces are very slow compared to deliberate design, but that is not way I wanted to convey (that's my fault). I think an ARA agent would not only depend on evolutionary forces, but also on the whole open source community finding new ways to quantify, prune, distill, and run the model in a distributed way in a practical way. I think the main driver this "evolution" would be the open source community & libraries who will want to create good "ARA", and huge economic incentive will make agent AIs more and more common and easy in the future.

-1 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:52 AM

[-]Gordon Seidoh Worley1y56

According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities.

Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say "We should not be considered to have ‘audited’ GPT-4 or Claude".

This doesn't appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.

[-]Jonathan Claybrough1y30

Might be good to have a dialogue format with other people who agree/disagree to flesh out scenarios and countermeasures

[-]Charbel-Raphaël1y10

Why not! There are many many questions that were not discussed here because I just wanted to focus on the core part of the argument. But I agree details and scenarios are important, even if I think this shouldn't change too much the basic picture depicted in the OP.

Here are some important questions that were voluntarily omitted from the QA for the sake of not including stuff that fluctuates too much in my head;

would we react before the point of no return?
Where should we place the red line? Should this red line apply to labs?
Is this going to be exponential? Do we care?
What would it look like if we used a counter-agent that was human-aligned?
What can we do about it now concretely? Is KYC something we should advocate for?
Don’t you think an AI capable of ARA would be superintelligent and take-over anyway?
What are the short term bad consequences of early ARA? What does the transition scenario look like.
Is it even possible to coordinate worldwide if we agree that we should?
How much human involvement will be needed in bootstrapping the first ARAs?

We plan to write more about these with @Épiphanie Gédéon in the future, but first it's necessary to discuss the basic picture a bit more.

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

19

[ Question ]

We might be dropping the ball on Autonomous Replication and Adaptation.

19

1 Answers sorted by
top scoring

May 31, 2024

19

[ Question ]

We might be dropping the ball on Autonomous Replication and Adaptation.

19

1 Answers sorted by top scoring

May 31, 2024

1 Answers sorted by
top scoring