I'm going to collect here new papers that might be relevant:
My response to the alignment / AI representatives proposals:
Even if AIs are "baseline aligned" to their creators, this doesn't automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, "You are messing up, please coordinate with other nations/groups, stop what you are doing" requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we've seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn't guarantee it will be heeded or lead to effective collective action.
[...] "Politicians...will remain aware...able to change what the system is if it has obviously bad consequences." The climate change analogy is pertinent here. We have extensive scientific consensus, an "oracle IPCC report", detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are "obviously bad." The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.
Extract copy pasted from a longer comment here.
While I concur that power concentration is a highly probable outcome, I believe complete disempowerment warrant deeper consideration, even under the assumptions you've laid out. Here are some thoughts on your specific points:
On "No hot global war":
You express hope that we won't enter a situation where a humanity-destroying conflict seems plausible.
Responding to your thoughts on why the feedback loops might be less likely if your three properties hold:
As a meta point, the fact that the quantity and quality of discourse on this matter is so low, and the fact that people are continuing to say “LET’S GO WE ARE CREATING POWERFUL AIS, and don't worry, we plan to align them, even if we don't really know which type of alignment do we really need, and if this is even doable in time” while we have not rigorously assessed all those risks, is really not a good sign.
At the end of the day, my probability for something in the ballpark of gradual disempowerment / extreme power concentration and loss of democracy is 40%-ish, much higher than scheming (20%) leading to direct takeover (let’s say 10% post mitigation like control).
Thanks a lot for writing this, this is an important consideration, and it would be sweet if Anthropic updated accordingly.
Some remarks:
OK, thanks a lot, this is much clearer. So basically most humans lose control, but some humans keep control.
And then we have this meta-stable equilibrium that might be sufficiently stable, where humans at the top are feeding the other humans with some kind of UBI.
For me, this is not really desirable - the power is probably going to be concentrated into 1-3 people, there is a huge potential for value locking, those CEOs become immortal, we potentially lose democracy (I don't see companies or US/China governments as particularly democratic right now), the people on the top become potentially progressively corrupted as is often the case. Hmm.
Then, is this situation really stable?
But at least I think that I can see now how we could still live for a few more decades under the authority of a world dictator/pseudo-democracy while this was not clear for me beforehand.
Thanks for continuing to engage. I really appreciate this thread.
"Feeding humans" is a pretty low bar. If you want humans to live as comfortably as today, this would be more like 100% of GDP - modulo the fact that GDP is growing.
But more fundamentally, I'm not sure the correct way to discuss the resource allocation is to think at the civilization level rather than at the company level: Let's say that we have:
It seems to me that if you are an investor, you will give your money to B. It seems that in the long term, B is much more competitive, gains more money, is able to reduce its prices, nobody buys from A, and B invests this money into more automated-humans and crushes A and A goes bankrupt. Even if alignment is solved, and the humans listen to his AIs, it's hard to be competitive.
a tiny fraction of resources toward physically keeping the humans alive (which is very cheap, at least once AIs are very powerful)
I'm not sure it's very cheap.
It seems to me that for the same amount of energy and land you need for a human, you could replace a lot more economically valuable work with AI.
Sure, at some point keeping humans alive is a negligible cost, but there's a transition period while it's still relatively expensive - and that's part of why a lot of people are going to be laid off - even if the company ends up getting super rich.
Well done - this is super important. I think this angle might also be quite easily pitchable to governments.
Thanks a lot!
We think a relatively inexpensive method for day-to-day usage would be using Sonnet to monitor Opus, or Gemini 2.5 Flash to monitor Pro. This would probably be just a +10% overhead. But we have not run this exact experiment; this would be a follow-up work.