That's surprising and concerning. As you say, if these companies expect their AIs to do end-to-end engineering and R&D tasks internally, it seems difficult to imagine how they could do that without having employee-level privileges. Any place where they don't is a place where humans turn into a bottleneck. I can imagine a few possible objections to this:

They don't expect those bottlenecks to impose that much of a penalty.
1. I'm not sure how this could be true unless you think that AI systems will be pretty weak, which is sort of fighting the premise. What are we worried about, again?
They expect the bottlenecks to impose a large penalty and plan on biting that bullet when the time comes.
1. I currently roll to disbelieve based on both the publicly-held positions of the relevant organizations and also their historical track records. Incentives here seem quite bad.
They plan on trying to thread the needle by employing some control schemes where (for example) different "agents" have different permissions. i.e. a "code writing" agent has read permissions for (some parts of) the codebase, the ability to write, deploy, and test changes to that code in a sandboxed dev environment, and the ability to open a pull request with those changes. Another set of agents have permissions to review pull requests, and then request changes, approve/merge them, or flag the PR as suspicious. Yet another set of agents act as gatekeepers to sensitive data that might be needed for some experiments but only with good justification.
1. This still suffers from the incentive gradient pushing quite hard to just build end-to-end agents. Not only will it probably work better, but it'll be straight up cheaper and easier!

Like, to be clear, I would definitely prefer a world where these organizations wrote "small and carefully locked-down infrastructure" as the limited surface their AIs were allowed to interact with; I just don't expect that to actually happen in practice.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

58

Comparing risk from internally-deployed AI to insider and outsider threats from humans

58