My response to the alignment / AI representatives proposals:
Even if AIs are "baseline aligned" to their creators, this doesn't automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, "You are messing up, please coordinate with other nations/groups, stop what you are doing" requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we've seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn't guarantee it will be heeded or lead to effective collective action.
[...] "Politicians...will remain aware...able to change what the system is if it has obviously bad consequences." The climate change analogy is pertinent here. We have extensive scientific consensus, an "oracle IPCC report", detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are "obviously bad." The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.
Extract copy pasted from a longer comment here.
I don't see how these help.
First, it seems to me that interoperability + advisors would be useless for helping people sociopolitically maneuver precisely up until the point the AI models are good enough to disempower most of them. Imagine some not-particularly-smart person with no AI expertise and not much mental slack for fighting some abstract battle about the future of humanity. A demographic of those people is then up against the legal teams of major corporations and the departments of major governments navigating the transition. The primary full-time jobs of the people working in the latter groups, at which they'd be very skilled, would be figuring out how to disempower the former demographics. In what case do the at-risk demographics stand any chance?
Well, if the AI models are good enough that the skills and the attentions of the humans deploying them don't matter. Which is to say: past the point at which the at-risk demographics are already disempowered.
Conversely, if the AI capabilities are not yet there, then different people can use AIs more or less effectively depending on how smart and skilled they are, and how much resources/spare attention for the fight they have. In which case the extant powers are massively advantaged and on-expectation win.
Second, I'm skeptical about feasibility. This requires, basically, some centralized distribution system which (1) faithfully trains AI models to be ultimately loyal to their end advisees, (2) fully subsidizes the compute costs for serving these model to all the economically useless people (... in the US? ... in the world?). How is that centralized system not subverted/seized by the extant powers? (E. g., the government insisting, in a way that sounds surface-level reasonable, on ultimate loyalty to its laws first, which it can then freely rewrite to have complete control.)
Like, suppose this system is set up some time before AI models are good enough to make human workers obsolete. As per the first point, the whole timespan prior to the AI models becoming that good would involve the entrenched powers successfully scheming towards precisely the gradual disempowerment we're trying to prevent. How do we expect this UBI-like setup[1] to still be in place, uncorrupted, by the point AI models become good enough, given that it is the largest threat to the ability of extant powers to maintain/increase their power? This would be the thing everyone in power is trying to dismantle. (And, again, unless the AIs are already good enough to disempower the risk demographics, the risk demographics plus their AI representatives would be massively worse at fighting this battle than their opponents plus their AI representatives.)
Third: Okay, let's suppose the system is somehow in place, everyone in the world has a loyal AI representative advising them and advocating for their interests. What are these representatives supposed to do? Like, imagine some below-average-intelligence menial-labor worker hopelessly outclassed by robots. What moves is their AI representative supposed to make to preserve their power and agency? They don't really have any resources to bargain with; it's the ground truth.
Will the AIs organize them to march to the military bases/datacenters/politicians' and CEOs' homes and physically seize the crucial resources/take the key players hostage, or what?
There was a thought experiment going around on Twitter a few months back, which went:
Suppose people have to press one of two buttons, Blue and Red. If more than 50% of people press Blue, nobody dies. If more than 50% press Red, everyone who pressed Blue dies. Which button do you press?
Red is guaranteed to make you safe, and in theory, if everyone pressed Red, everyone would be safe. But if we imagine something isomorphic to this happening in reality, we should note that getting 100% of people to make the correct choice is incredibly hard. Some would be distracted, or confused, or they'd slip and fall and smack the wrong button accidentally.
To me, all variations on "let's give everyone in the world an AGI advisor to prevent disempowerment!" read as "let's all play Red!". The details don't quite match (50% survival/non-disempowerment rate is not at all guaranteed), but the vibe and the failure modes are the same.
Which is also supposed to be implemented during Trump's term?
Note: as a general policy I'm not planning on engaging with the comments here, this is just because I don't want to spend a bunch of time on this topic and this could easily eat a bunch of time. Sorry about that.
Do states and corporations also have their aligned representatives? Is the cognitive power of the representatives equal, roughly equal, or wildly unequal? If it is unequal, why are the resulting equilibria pro-human? (i.e. if I imagine individual humans like me represented by eg GPT4 while the government runs tens of thousands o4s, I would expect my representative to get convinced about whatever government wants)
There have recently been various proposals for mitigations to "the intelligence curse" or "gradual disempowerment"—concerns that most humans would end up disempowered (or even dying) because their labor is no longer valuable. I'm currently skeptical that the typically highlighted prioritization and interventions are best and I have some alternative proposals for relatively targeted/differential interventions which I think would be more leveraged (as in, the payoff is higher relative to the difficulty of achieving them).
It's worth noting I doubt that these threats would result in huge casualty counts (due to e.g. starvation) or disempowerment of all humans (though substantial concentration of power among a smaller group of humans seems quite plausible).[1] I decided to put a bit of time into writing up my thoughts out of general cooperativeness (e.g., I would want someone in a symmetric position to do the same).
(This was a timeboxed effort of ~1.5 hr, so apologies if it is somewhat poorly articulated or otherwise bad. Correspondingly, this post is substantially lower effort than my typical post.)
My top 3 preferred interventions focused on these concerns are:
Some things which help with above:
Implicit in my views is that the problem would be mostly resolved if people had aligned AI representatives which helped them wield their (current) power effectively.
To be clear, something like these interventions has been highlighted in prior work, but I have a somewhat different emphasis and prioritization and I'm explicitly deprioritizing other interventions.
Deprioritized interventions and why:
(I'm not discussing interventions targeting misalignment risk, biorisk, or power grab risk, as these aren't very specific to this threat model.)
Again, note that I'm not particularly recommending these interventions on my views about the most important risks, just claiming these are the best interventions if you're worried about "intelligence curse" / "gradual disempowerment" risks.
That said, I do think that technical misalignment issues are pretty likely to disempower all humans and I think war, terrorism, or accidental release of homicidal bioweapons could kill many. That's why I focus on misalignment risks.