(I expect that the point of this post is already obvious to many of the people reading it. Nevertheless, I believe that it is good to mention important things even if they seem obvious.)

OpenAI, DeepMind, Anthropic, and other AI organizations focused on capabilities, should shut down. This is what would maximize the utility of pretty much everyone, including the people working inside of those organizations.

Let's call Powerful AI ("PAI") an AI system capable of either:

  • Steering the world towards what it wants hard enough that it can't be stopped.
  • Killing everyone "un-agentically", eg by being plugged into a protein printer and generating a supervirus.

and by "aligned" (or "alignment") I mean the property of a system that, when it has the ability to {steer the world towards what it wants hard enough that it can't be stopped}, what it wants is nice things and not goals that entail killing literally everyone (which is the default).

We do not know how to make a PAI which does not kill literally everyone. OpenAI, DeepMind, Anthropic, and others are building towards PAI. Therefore, they should shut down, or at least shut down all of their capabilities progress and focus entirely on alignment.

"But China!" does not matter. We do not know how to build PAI that does not kill literally everyone. Neither does China. If China tries to build AI that kills literally everyone, it does not help if we decide to kill literally everyone first.

"But maybe the alignment plan of OpenAI/whatever will work out!" is wrong. It won't. It might work if they were careful enough and had enough time, but they're going too fast and they'll simply cause literally everyone to be killed by PAI before they would get to the point where they can solve alignment. Their strategy does not look like that of an organization trying to solve alignment. It's not just that they're progressing on capabilities too fast compared to alignment; it's that they're pursuing the kind of strategy which fundamentally gets to the point where PAI kills everyone before it gets to saving the world.

Yudkowsky's Six Dimensions of Operational Adequacy in AGI Projects describes an AGI project with adequate alignment mindset is one where

The project has realized that building an AGI is mostly about aligning it. Someone with full security mindset and deep understanding of AGI cognition as cognition has proven themselves able to originate new deep alignment measures, and is acting as technical lead with effectively unlimited political capital within the organization to make sure the job actually gets done. Everyone expects alignment to be terrifically hard and terribly dangerous and full of invisible bullets whose shadow you have to see before the bullet comes close enough to hit you. They understand that alignment severely constrains architecture and that capability often trades off against transparency. The organization is targeting the minimal AGI doing the least dangerous cognitive work that is required to prevent the next AGI project from destroying the world. The alignment assumptions have been reduced into non-goal-valent statements, have been clearly written down, and are being monitored for their actual truth.

(emphasis mine)

Needless to say, this is not remotely what any of the major AI capabilities organizations look like.

At least Anthropic didn't particularly try to be a big commercial company making the public excited about AI. Making the AI race a big public thing was a huge mistake on OpenAI's part, and is evidence that they don't really have any idea what they're doing.

It does not matter that those organizations have "AI safety" teams, if their AI safety teams do not have the power to take the one action that has been the obviously correct one this whole time: Shut down progress on capabilities. If their safety teams have not done this so far when it is the one thing that needs done, there is no reason to think they'll have the chance to take whatever would be the second-best or third-best actions either.

This isn't just about the large AI capabilities organizations. I expect that there's plenty of smaller organizations out there headed towards building unaligned PAI. Those should shut down too. If these organizations exist, it must be because the people working there think they have a real chance of making some progress towards more powerful AI. If they are, then that's real damage to the probability that anyone at all survives, and they should shut down as well in order to stop doing that damage. It does not matter if you think you have only a small negative impact on the probability that anyone survives at all — the actions that maximize your utility are the ones that decrease the probability that PAI kills literally everyone, even if it's just by a small amount.

Organizations which do not directly work towards PAI but provides services that are instrumental to it — such as EleutherAI, HuggingFace, etc — should also shut down. It does not matter if your work only contributes "somewhat" to PAI killing literally everyone. If the net impact of your work is a higher probability that PAI kills literally everyone, you should "halt, melt, and catch fire".

If you work at any of those organizations, your two best options to maximize your utility are to find some way to make that organization slower at getting to PAI (eg by advocating for more safety checks that slow down progress, and by yourself being totally unproductive at technical work), or to quit. Stop making excuses and start taking the correct actions. We're all in this together. Being part of the organization that kills everyone will not do much for you — all you get is a bit more wealth-now, which is useless if you're dead and useless if alignment is solved and we get utopia.

See also:

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 8:21 AM

Agreed.

In addition: I expect one of the counter-arguments to this would be "if these labs shut down, more will spring up in their place, and nothing would change".

Potentially-hot take: I think that's actually a much lesser concern that might seem.

The current major AGI labs are led by believers. My understanding is that quite a few (all?) of them bought into the initial LW-style AGI Risk concerns, and founded these labs as a galaxy-brained plan to prevent extinction and solve alignment. Crucially, they aimed to do that well before the talk of AGI became mainstream. They did it back in the days where "AGI" was a taboo topic due to the AI field experiencing one too many AI winters.

They also did that in defiance of profit-maximization gradients. Back in 2010s, "AGI research" may have sounded like a fringe but tolerable research topic, but certainly not like something that'd have invited much investor/market hype.

And inasmuch as humanity is still speeding up towards AGI, I think that's currently still mostly spearheaded by believers. Not by raw financial incentives or geopolitical races. (Yes, yes, LLMs are now all the hype, and I'm sure the military loves to put CNNs on their warheads' targeting systems, or whatever it is they do. But LLMs are not AGI.)

Outside the three major AGI labs, I'm reasonably confident no major organization is following a solid roadmap to AGI; no-one else woke up. A few LARPers, maybe, who'd utter "we're working on AGI" because that's trendy now. But nobody who has a gears-level model of the path there, and what its endpoint entails.

So what would happen if OpenAI, DeepMind, and Anthropic shut down just now? I'm not confident, but I'd put decent odds that the vision of AGI would go the way great startup ideas go. There won't be necessarily anyone who'd step in to replace them. There'd be companies centered around scaling LLMs in the brutest manners possible, but I'm reasonably sure that's mostly safe.

The business world, left to its own devices, would meander around to developing AGI eventually, yes. But the path it'd take there might end up incremental and circuitous, potentially taking a few decades more. Nothing like the current determined push.

... Or so goes my current strong-view-weakly-held.

Outside the three major AGI labs, I'm reasonably confident no major organization is following a solid roadmap to AGI; no-one else woke up. A few LARPers, maybe, who'd utter "we're working on AGI" because that's trendy now. But nobody who has a gears-level model of the path there, and what its endpoint entails.

This seems pretty false. In terms of large players, there also exists Meta and Inflection AI. There are also many other smaller players who also care about AGI, and no doubt many AGI-motivated workers at three labs mentioned would start their own orgs if the org they're currently working under shuts down.

Inflection's claim to fame is having tons of compute and promising to "train models that are 10 times larger than the cutting edge GPT-4 and then 100 times larger than GPT-4", plus the leader talking about "the containment problem" in a way that kind-of palatably misses the point. So far, they seems to be precisely the sort of "just scale LLMs" vision-less actor I'm not particularly concerned about.  I could be proven wrong any day now, but so far they don't really seem to be doing anything interesting.

As to Meta – what's the last original invention they did? Last I checked, they couldn't even match GPT-4, with all of Meta's resources. Yann LeCun has thoughts on AGI, but it doesn't look like he's being allowed to freely and efficiently pursue them. That seems to be how a vision-less major corporation investing in AI looks like. Pretty unimpressive.

Current AGI labs metastazing across the ecosystem and potentially founding new ones if shut down – I agree that it may be a problem, but I don't think they necessarily by-default coalesce into more AGI labs. Some of them have research skills but no leadership/management skills, for example. So while they'd advance towards an AGI when embedded into a company with this vision, they won't independently start one up if left to their own devices, nor embed themselves into a different project and hijack it towards AGI-pursuit. And whichever of them do manage that – they'd be unlikely to coalesce into a single new organization, meaning the smattering of new orgs would still advance slower collectively, and each may have more trouble getting millions/billions of funding unless the leadership are also decent negotiators.

[-]O O4mo30

meaning the smattering of new orgs would still advance slower collectively, and each may have more trouble getting millions/billions of funding unless the leadership are also decent negotiators

This seems to contradict history. The split up of Standard Oil for example led to innovations in oil drilling. Also you are seriously overestimating how hard it is to get funding. Much stupider and more poorly run companies have gotten billions in funding. And these leaders in the worst case can just hire negotiators.

Presumably these innovations were immediately profitable. I'm not sure that moves towards architectures closer to AGI (as opposed to myopic/greedy-search moves towards incrementally-more-capable models) are immediately profitable. It'd be increasingly more true as we inch closer to AGI, but it definitely wasn't true back in the 2010s, and it may not yet be true now.

So I'm sure some of them would intend to try innovations that'd inch closer to AGI, but I expect them not to be differentially more rewarded by the market. Meaning that, unless one of these AGI-focused entrepreneurs is also really good at selling their pitch to investors (or has the right friend, or enough money and competence-recognition ability to get a co-founder skilled at making such pitches), then they'd be about as well-positioned to rush to AGI as some of the minor AI labs today are. Which is to say, not all that well-positioned at all.

you are seriously overestimating how hard it is to get funding

You may not be taking into account the market situation immediately after major AI labs' hypothetical implosion. It'd be flooded with newly-unemployed ML researchers trying to found new AI startups or something; the demand on that might well end up saturated (especially if major labs' shutdown cools the hype down somewhat). And then it's the question of which ideas are differentially more likely to get funded; and, as per above, I'm not sure it's the AGI-focused ones.

I think this is basically correct and I'm glad to see someone saying it clearly.

It's useful to separately consider extinction and disempowerment. It's not an unusual position that the considered decision of an AGI civilization is to avoid killing everyone. This coexists with possibly much higher probablity of expected disempowerment. (For example, my expectation for the next few years while the LLMs are scaling is 90% disempowerment and 30% extinction, conditional on AGI in that timeframe, with most of extinction being misuse or rogue AGIs that would later regret this decision or don't end up representative in the wider AGI civilization. Extinction gets more weight with AGIs that don't rely on human datasets as straightforwardly.)

I think the argument for shutdown survives replacement of extinction with disempowerment-or-extinction, which is essentially the meaning of existential risk. Disempowerment is already pretty bad.

The distinction can be useful for reducing probablity of extinction-given-disempowerment, trying to make frontier AI systems pseudokind. This unfortunately gives another argument for competition between labs, given the general pursuit of disempowerment of humanity.

TL;DR: Private AI companies such as Anthropic which have revenue-generating products and also invest heavily in AI safety seem like the best type of organization for doing AI safety research today. This is not the best option in an ideal world and maybe not in the future but right now I think it is.

I appreciate the idealism and I'm sure there is some possible universe where shutting down these labs would make sense but I'm quite unsure about whether doing so would actually be net-beneficial in our world and I think there's a good chance it would be net-negative in reality.

The most glaring constraint is finances. AI safety is funding-constrained so this is worth mentioning. Companies like DeepMind and OpenAI spend hundreds of millions of dollars per year on staff and compute and I doubt that would be possible in a non-profit. Most of the non-profits working on AI safety (e.g. Redwood Research) are small with just a handful of people. OpenAI changed their company from a non-profit to a capped for-profit because they realized that being a non-profit would have been insufficient for scaling their company and spending. OpenAI now generates $1 billion in revenue and I think it's pretty implausible that a non-profit could generate that amount of income.

The other alternative apart from for-profit companies and philanthropic donations is government funding. It is true that governments fund a lot of science. For example, the US government funds 40% of basic science research. And a lot of successful big science projects such as CERN and the ITER fusion project seem to be mostly government-funded. However, I would expect a lot of government-funded academic AI safety grants to be wasted by professors skilled at putting "AI safety" in their grant applications so that they can fund whatever they were going to work on anyway. Also, the fact that the US government has secured voluntary commitments from AI labs to build AI safely gives me the impression that governments are either unwilling or incapable of working on AI safety and instead would prefer to delegate it to private companies. On the other hand, the UK has a new AI safety institute and a language model task force.

Another key point is research quality. In my opinion, the best AI safety research is done by the big labs. For example, Anthropic created constitutional AI and they also seem to be a leader in interpretability research. I think empirical AI safety work and AI capabilities work involve very similar skills (coding etc.) and therefore it's not surprising that leading AI labs also do the best empirical AI safety work. There are several other reasons for explaining why big AI labs do the best empirical AI safety work. One is talent. Top labs have the money to pay high salaries which attracts top talent. Work in big labs also seems more collaborative than in academia which seems important for large projects. Many top projects have dozens of authors (e.g. the Llama 2 paper). Finally, there is compute. Right now, only big labs have the infrastructure necessary to do experiments on leading models. Doing experiments such as fine-tuning large models requires a lot of money and hardware. For example, this paper by DeepMind on reducing sycophancy apparently involved fine-tuning the 540B PaLM model which is probably not possible for most independent and academic researchers right now and consequently, they usually have to work with smaller models such as Llama-2-7b. However, the UK is investing in some new public AI supercomputers which hopefully will level the playing field somewhat. If you think theoretical work (e.g. agent foundations) is more important than empirical work then big labs have less of an advantage. Though DeepMind is doing some of that too.