AI ALIGNMENT FORUM
AF

AI TakeoffComputer Security & CryptographyAI
Frontpage

31

How much novel security-critical infrastructure do you need during the singularity?

by Buck Shlegeris
4th Jul 2025
6 min read
5

31

AI TakeoffComputer Security & CryptographyAI
Frontpage
How much novel security-critical infrastructure do you need during the singularity?
5Ryan Greenblatt
4Rohin Shah
4Buck Shlegeris
2Rohin Shah
New Comment
4 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:46 PM
[-]Ryan Greenblatt6d50

Adopting new hardware will require modifying security-critical code

Another concern is that AI companies (or the AI company) will rapidly buy a bunch of existing hardware (GPUs, other accelerators, etc.) during the singularity, and handling this hardware will require many infrastructure changes in a short period of time. New infrastructure might be needed to handle highly heterogeneous clusters built out of a bunch of different GPUs/CPUs/etc. (potentially including gaming GPUs) bought in a hurry. AI companies might buy the hardware of other AI companies, and it might be non-trivial to adapt the hardware to the other setup.

Reply1
[-]Rohin Shah5d42

Technologies that allow workers to be more isolated from each other gain you both convenience (because your coworkers no longer accidentally mess up what you’re doing) and also security (because you can remove your coworker’s permission to affect the code you’re running), but generally reduce efficiency. When we try to buy efficiency at the cost of convenience, we might lose security too.

Hmm, this feels less likely to me. Isolation can often be an efficiency benefit because one employee's mistake doesn't propagate to screwing up the work of everyone else, and this benefit scales as the number of employees increases. This assumes that labor is at least somewhat unreliable, but I do expect that AI labor will be overall unreliable, just because reliable AI labor will likely be much more expensive and won't look cost-effective to deploy in most situations.

I'd expect that you buy efficiency by building more fine-grained configuration mechanisms and requiring that employees use them. This will also become easier with AI, since you can have specialized AI assistants that understand the configuration mechanisms well and can set them for all experiments (unlike the current status quo where all your employees have to be separately trained on the use of the configurations and many of them forget it and just copy configurations from elsewhere).

In general it seems like this should improve security by creating more info and legibility around what is happening (which in turn gives you more affordances to enforce security). Some examples (not necessarily about AI):

  1. For code that is costly to run (e.g. uses lots of compute, causes physical-world effects, affects production systems), spending labor to run automated bug finding tools that generate lots of false positives, and addressing the results. This directly increases security by having fewer bugs.
  2. Breaking up big jobs into many smaller ones that are easier to schedule across your compute fleet. This can enable you to isolate small jobs from each other which reduces the attack surface.
  3. Requiring detailed analysis and justification before using significant amounts of compute for experiments. This should increase security against rogue internal deployments, as it becomes easier to catch the AI in a lie.

That said, I did think of one example that worsens security -- switching from a memory-safe language like Python to one that requires manual memory management like C. Though in this case, it's that the "security infra" (automatic memory management) itself is too compute-expensive to run, which doesn't seem like it will apply in the AI case.

Overall my very weak guess is that the net effect will be to increase security. Of course, the part where you build more fine-grained configuration mechanisms is "novel (potentially) security-critical infrastructure", so this does support your broader point. (When I say it will increase security, this is under the assumption that the novel infra isn't sabotaged by scheming AIs.)

Reply
[-]Buck Shlegeris5d40

I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.

I agree that the increasing value of compute might increase the isolation you use for the reason you said. One reason I'm skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.

Reply
[-]Rohin Shah5d20

I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.

You mention permission systems, which is certainly a big deal, but I didn't see anything about broader configuration mechanisms, much of which can be motivated solely by efficiency and incidentally helps with security. (I was disputing your efficiency -> less security claim; permissions mechanisms aren't a valid counterargument since they aren't motivated by efficiency.)

One reason I'm skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.

Hmm, I think this assumes the AIs will be more reliable than I expect them to be. You would want this to be reliable enough that your (AI or human) software engineers ~never have to consider non-isolation as a possible cause of bugs (unless explicitly flagged otherwise); I'd guess that corresponds to about 6 nines of reliability at following the naming conventions? (Which might get you to 8 nines of reliability at avoiding non-isolation bugs, since most naming convention errors won't lead to bugs.)

Some complete guesses: human software engineers might get 0.5-2 nines of reliability at following the naming conventions, a superhuman coder model might hit ~4 nines of reliability if you made this a medium priority, if you add a review step that explicitly focuses just on checking adherence to naming conventions then you might hit 6 nines of reliability if the critique uses a model somewhat more powerful than a current frontier model. But this is now a non-trivial expense; it's not clear to me it would be worth it.

Reply
Moderation Log
Curated and popular this week
4Comments

I think a lot about the possibility of huge numbers of AI agents doing AI R&D inside an AI company (as depicted in AI 2027). I think particularly about what will happen if those AIs are scheming: coherently and carefully trying to grab power and take over the AI company, as a prelude to taking over the world. And even more particularly, I think about how we might try to mitigate the insider risk posed by these AIs, taking inspiration from traditional computer security, traditional insider threat prevention techniques, and first-principles thinking about the security opportunities posed by the differences between AIs and humans.

So to flesh out this situation, I’m imagining a situation something like AI 2027 forecasts for March 2027:

  • The compute available to the leading AI company has increased 7x compared to today.
  • There are now 200K superhuman coder copies thinking at 30x human speed.
  • These AIs are mostly working on ML R&D, doing tasks like:
    • Generating training data, e.g. RL environments
    • Algorithm development, e.g. finding architecture improvements, implementing neuralese, finding more efficient ways to run iterated distillation and amplification.
    • Performance engineering: finding ways of running inference, training, and other expensive workloads more efficiently.

I want to know what we need to do to ensure that even if those AIs are conspiring against us, they aren’t able to cause catastrophic security failures.[1]

One important question here is how strongly incentivized we are to use the AIs to write security-critical infrastructure code.

The most optimistic possibility here is that the AIs don’t need to write such code: instead of acting as infrastructure engineers, they act with basically the permissions that ML researchers at Google DeepMind have.[2] Something like this: The AIs act as users of the cluster infrastructure, rather than as developers of that infrastructure. When they want to run jobs, they submit their jobs to a scheduler with a certain set of permissions, and the infrastructure ensures that the jobs can’t affect parts of the system they don’t have permission to. 

But the AIs don’t make lots of contributions to that underlying infrastructure (e.g. the job scheduler or the hypervisors controlling jobs on the servers). They aren’t making huge changes to the codebases owned by the datacenter teams, the infrastructure teams, or the security teams.

The more pessimistic possibility is that the AI company needs to have the AIs write a bunch of novel infrastructure code. Under this possibility, the AIs are rewriting your routers, your job scheduler, your OS kernel, your GPU interconnect drivers, and so on. If this is the case, we’re in a much scarier situation, because we have an additional huge and confusing attack surface.

Note that even in the optimistic case where the AIs just have the permissions that ML researchers usually have, and no lower-level permissions at all, the situation seems very dangerous! Most of the threat modeling I’ve done related to AI control is about the risks posed by the “optimistic” situation. (For why I think this is still risky, see footnote[3].) What I’m discussing here is whether there's a big source of risk aside from the ones I normally talk about!

So, how strongly will AI companies be incentivized to do novel infrastructure development during an intelligence explosion? I’m quite unsure. Here are some reasons that we might expect big infrastructure changes, based on conversations with AI infrastructure experts and my own thinking.

Differences between AI and human labor will incentivize massive infra changes

During the intelligence explosion, I expect AI agents to have pretty different properties from human researchers, which might necessitate either developing new infrastructure, or using current infrastructure in ways that degrade security properties.

One particularly important dynamic is: researchers currently make many tradeoffs that reduce performance in return for saving researcher time. For example, they repeatedly rebuild Docker containers, in return for reduced risk of Python environment issues. The advent of automated engineers and researchers will probably mean[4] we want to shift to a workflow where (like the computer users of the past) the users have much less separation and rely on manual coordination, in order to eke out performance. For example, instead of working on separate computers, they might want to work as different users (or the same user!) on the same Linux server. This would make it way harder for us to enforce a bunch of security invariants (especially stuff about tracking how information was flowing from agent to agent).

Another way of saying this is: Technologies that allow workers to be more isolated from each other gain you both convenience (because your coworkers no longer accidentally mess up what you’re doing) and also security (because you can remove your coworker’s permission to affect the code you’re running), but generally reduce efficiency. When we try to buy efficiency at the cost of convenience, we might lose security too.

This argument doesn’t suggest that AIs will necessitate development of novel security-critical infrastructure; the concern here is that we’ll need to stop using existing technology that enforces security properties. (And then developing new technology for these new workflows would either be an example of novel datacenter infrastructure as I defined it, or would be impossible to do without losing the performance gains.)

Side-channel prevention poses particularly tricky difficulties along these lines. When there’s a resource shared between multiple users, you inherently have to choose between inefficiency or side-channels. (If there’s a bathroom in your shared house, you’ll learn what time your roommate tends to shower, unless you make a rule that he’s only allowed to use it in the even-numbered hours and you’re only allowed to use it in the odd-numbered hours.) The throughput of the side-channel obviously varies depending on the details, but there are at least some cases (especially sharing memory/CPUs/GPUs) where preventing side-channels seems really hard.

On the other hand, increased AI labor might mean the AIs can not-that-expensively jump through annoying hoops created by fine-grained permission management, which might improve security (as discussed here).

Adopting new hardware will require modifying security-critical code

It looks like a bunch of security-critical components are currently under active development, so we should expect them to change over time. For example, networking equipment has been developed a bunch recently to support ML workflows, and some of it is designed to be user-configurable (e.g. Juniper’s user-programmable routers, which give a case study of doing all-reduce in your router (!)). This is evidence that there will be lots of deployed non-mature technology that needs security-critical code, some of which has been especially designed to allow (and maybe require?) end-user programmable configuration.

Infra rewrites can get you big performance improvements

AIs might be able to rewrite code to make it much faster. If they can do that for security-critical infrastructure code, you’ll be strongly incentivized to let them do so. I don’t know how important this consideration is because I don’t know how much inefficiency is coming from parts of the stack that are security-critical. My guess is that most improvements to efficiency of training will come from parts of the stack that aren’t security critical (e.g. writing new kernels, developing new architectures), though some parts seem security critical (e.g. low level network configuration). I don’t know how much efficiency gain on other workloads you can get from upgrading security-critical infrastructure.

The AI company builds new security-critical infra really fast for the sake of security

The AI company might be afraid that off-the-shelf code might have vulnerabilities that adversaries know about (either because of offensive cyber research that adversaries have done, or because it’s been backdoored), and so might want to rewrite everything to reduce security risk. AI company employees often propose doing this.

Also, AI companies might write a bunch of security-critical infra code really fast because they suddenly decide to build or physically reconfigure their datacenter really fast in order to increase physical security. These security requirements might rule out the possibility of them using an existing cloud compute provider, potentially necessitating a bunch of novel code.

Conclusion

I still feel pretty unsure about how all this is going to play out.

Talking to infrastructure engineers inside AI companies has been somewhat helpful for clarifying my thoughts here, but they mostly haven’t thought through the futurism part of this, which means that I can’t just rely on them to give me answers.

I’m hoping to think more about these questions over the next few weeks.

  1. ^

    I think the most plausible such security failures are the AI exfiltrating its weights, the AI hacking the datacenters enough that the AI company loses necessary observability into what’s happening inside them, or the AI seizing control of the AI company (perhaps by threatening or bribing company leadership).

  2. ^

    I’m using GDM as the example because they’re part of Google, a company that has famously good security and that doesn’t rely on external cloud providers. The situation seems basically just worse for Anthropic/OpenAI/x.ai.

  3. ^

    To start with, the AIs might be tasked with doing things that can lead to catastrophe without them having to violate any security invariants that AWS enforces on users. One example is that the AIs might be able to launch rogue internal deployments without violating those security invariants, if it already has access to scary model weights. Another is that if the AIs are tasked with generating data for future training runs, that data might be covertly poisoned. The other big issue is that even though AWS has pretty good security, it’s probably not robust to wildly superintelligent AIs putting hundreds of thousands of human-equivalent years of work into hacking it.

  4. ^

     [These numbers are sloppy] In March 2027 in the AI 2027 scenario, the compute available to OpenBrain has increased 7x compared to today, while there are now 200K Superhuman coder copies thinking at 30x human speed, for 6M human coder equivalents, which is like 4 OOMs more coders than it probably had in June 2025. So that’s about 3 OOMs less compute than is currently available.